scipy.optimize.curve_fit : Not able to do a curve fitting - python

I am still new with python and I have a problem wit curve fitting. The following program is a simplification of a bigger program that I create but it represent the problem that I have.
The problem is that I have a function which I called burger that I cannot fit a curve. This line : y=np.sqrt(y) : is a problem. When I remove it, i can fit it perfectly but that not the function I want.
How Can I do a fitting of this function y=np.sqrt(y)?
# -*- coding: utf-8 -*-
"""
Created on Wed Dec 11 22:14:54 2013
#author:
"""
import numpy as np
import matplotlib.pyplot as plt
import pdb
import scipy.optimize as optimization
from math import *
from scipy.optimize import curve_fit
import math
import moyenne
####################Function Burger###############################
def burger(t, E1, E2, N,tau):
nu=0.4 #Coefficient de Poisson
P=50 #Peak force
alpha=70.3 #Tip angle
y=((((pi/2.)*P*(1.-nu**2.))/(tan(alpha)))*(1./E1 + 1./E2*(1.-np.exp(-t/tau)) + 1./((N)*(1.-nu))*t))
y=np.sqrt(y)
return y
#######exemple d'utilisation##########
xlist=np.linspace(0,1,100)
ylist=[ burger(t,3, 2,1,0.1) for t in xlist]
#pdb.set_trace()
pa,j = curve_fit(burger,xlist,ylist)
yfit=[burger(x,*pa) for x in xlist]
plt.figure()
plt.plot(xlist,ylist,marker='o')
plt.plot(xlist,yfit)
plt.show()

So, this probably won't be the best answer you get, but while you wait for others here are some things to think about.
First, since you are new to python maybe you don't know, or maybe there is reason to solve these things in list comprehension, but I don't think you need the list comprehensions. You can use the numpy math operations to handle a whole array at a time. Instead of
y=((((pi/2.)*P*(1.-nu**2.))/(tan(alpha)))* ...
You can write
y = ((((np.pi/2.)*P*(1.-nu**2.))/(np.tan(alpha)))* ...
Then instead of
[ burger(t, 3., 2., 1., 0.1) for t in xlist]
you can do
burger(xlist, 3., 2., 1., 0.1)
This is will be a lot faster when you are working with arrays.
Secondly, just looking through a couple of things that were happening in the algorithm. It wasn't looking for your parameters in the right ranges. I looked up the algorithm it is using on the scipy.optimize page (here) and wikipedia says that the convergence is dependent on the initial guess and also that it finds the local, not global, minima (Sometimes your code hit negative values for the parameters which made the sqrt of y undefined for some cases). If there is a way you can give it a good initial guess then it should work ([1., 3., 3., 2] worked for me). My command that solved it was: pa,j = curve_fit(burger,xlist,ylist, [1., 3., 3., 2], maxfev=10000)).
Thirdly, the first error I got when I used your code was that it reached the max number of fevals. Add maxfev=10000 (or more if you need) as the last argument to curve_fit.
Check it out. If you can give your bigger problem an initial guess then maybe you'll get it to converge. Otherwise maybe a different algorithm could be more suitable?
Update: See this question for a more detailed explanation of why this works, but you can get it to work without a guess if you give it another kwg, diag.
Use:
pa,j = curve_fit(burger,xlist,ylist, diag=(1./xlist.mean(), 1./ylist.mean()), maxfev=10000)

Related

Can we modify the solution vector between integrations steps with scipy.integrate.ode, using VODE?

I am trying to get a solution for a stiff ODE problem where at each integration step, i have to modify the solution vector before continuing on the integration.
For that, i am using scipy.integrate.ode, with the integrator VODE, in bdf mode.
Here is a simplified version of the code i am using. The function is much more complex than that and involve the use of CANTERA.
from scipy.integrate import ode
import numpy as np
import matplotlib.pyplot as plt
def yprime(t,y):
return y
vode = ode(yprime)
vode.set_integrator('vode', method='bdf', with_jacobian=True)
y0 = np.array([1.0])
vode.set_initial_value(y0, 0.0)
y_list = np.array([])
t_list = np.array([])
while vode.t<5.0 and vode.successful:
vode.integrate(vode.t+1e-3,step=True)
y_list = np.append(y_list,vode.y)
t_list = np.append(t_list,vode.t)
plt.plot(t_list,y_list)
Output:
So far so good.
Now, the problem is that within each step, I would like to modify y after it has been integrated by VODE. Naturally, i want VODE to keep on integrating with the modified solution.
This is what i have tried so far :
while vode.t<5.0 and vode.successful:
vode.integrate(vode.t+1e-3,step=True)
vode.y[0] += 1 # Will change the solution until vode.integrate is called again
vode._y[0] += 1 # Same here.
I also have tried looking at vode._integrator, but it seems that everything is kept inside the fortran instance of the solver.
For quick reference, here is the source code of scipy.integrate.ode, and here is the pyf interface scipy is using for VODE.
Has anyone tried something similar ? I could also change the solver and / or the wrapper i am using, but i would like to keep on using python for that.
Thank you very much !
For those getting the same problem, the issue lies in the Fortran wrapper from Scipy.
My solution was to change the package used, from ode to solve_ivp. The difference is that solve_ivp is entirely made with Python, and you will be able to hack your way through the implementation. Note that the code will run slowly compared to the vode link that the other package used, even though the code is very well written and use numpy (basically, C level of performances whenever possible).
Here are the few steps you will have to follow.
First, to reproduce the already working code :
from scipy.integrate import _ivp # Not supposed to be used directly. Be careful.
import numpy as np
import matplotlib.pyplot as plt
def yprime(t,y):
return y
y0 = np.array([1.0])
t0 = 0.0
t1 = 5.0
# WITHOUT IN-BETWEEN MODIFICATION
bdf = _ivp.BDF(yprime,t0,y0,t1)
y_list = np.array([])
t_list = np.array([])
while bdf.t<t1:
bdf.step()
y_list = np.append(y_list,bdf.y)
t_list = np.append(t_list,bdf.t)
plt.plot(t_list,y_list)
Output :
Now, to implement a way to modify the values of y between integration steps.
# WITH IN-BETWEEN MODIFICATION
bdf = _ivp.BDF(yprime,t0,y0,t1)
y_list = np.array([])
t_list = np.array([])
while bdf.t<t1:
bdf.step()
bdf.D[0] -= 0.1 # The first column of the D matrix is the value of your vector y.
# By modifying the first column, you modify the solution at this instant.
y_list = np.append(y_list,bdf.y)
t_list = np.append(t_list,bdf.t)
plt.plot(t_list,y_list)
Gives the plot :
This does not have any physical sense for this problem, unfortunately, but it works for the moment.
Note : It is entirely possible that the solver become unstable. It has to do with the Jacobian not being updated at the right time, and so one would have to recalculate it again, which is performance heavy most of the time. The good solution to that would be to rewrite the class BDF to implement the modification before the Jacobian Matrix is updated.
Source code here.

Fundamental misunderstanding of translation and rotation in PythonOCC (OpenCascade)

Maybe this will help others trying to learn through tutorials/documentation/stackoverflow.
How can I rotate or translate a TopoDS_Shape (or any object), providing coordinates, angles, and axis? For Example: If my part is at (5.0, 1.0, 0.0), can I move it to (0.0, 0.0, 0.0)? Or make it face a new direction?
Attempted methods (not including what I think is unimportant code). I've tried to include some of the stuff I've spent most of my time on. Can't recall all of the other attempts I've made. Maybe someone experienced with PythonOCC or OpenCascade can see where I'm going wrong.
display, start_display, add_menu, add_function_to_menu = init_display()
aResShape = openFile.open(fileToOpen) #RETURNS SHAPE FROM STEP FILE
aResShape.Orientable(True)
#EXAMPLE
aResShape.Location().Transformation().SetRotation(gp_Quaternion(1., 1., 0., 1.))
#EXAMPLE
aResShape.Location().Transformation().SetTransformation(a,b)
#EXAMPLE
aResShape.Move(TopLoc_Location(gp_Trsf( gp_Trsf2d(1., 0.) )))
#EXAMPLE
aResShape.Reverse()
#EXAMPLE
p1 = gp_Pnt(700., 10., 80.)
d1 = gp_Dir(50., 50., 60.)
a = gp_Ax3(p1, d1)
p2 = gp_Pnt(2., 3., 4.)
d2 = gp_Dir(4., 5., 6.)
b = gp_Ax3(p2, d2)
print(aResShape.Location().Transformation().Transforms())
aResShape.Location().Transformation().SetTransformation(a,b)
print(aResShape.Location().Transformation().Transforms()) #RETURNS SAME VALUES
#EXAMPLE (TRYING TO SEE WHAT WORKS)
transform = gp_Trsf
transform.SetRotation(
gp_Ax1(
gp_Pnt(0.,0.,0.),
gp_Dir(0.,0.,1.)
),
1.570796
)
print(transform)
display.DisplayShape(aResShape, color='Black', update=True)
display.FitAll()
display.SetModeWireFrame()
start_display()
Sometimes I'll get errors like this:
NotImplementedError: Wrong number or type of arguments for overloaded function 'new_gp_Trsf2d'.
Possible C/C++ prototypes are:
gp_Trsf2d::gp_Trsf2d()
gp_Trsf2d::gp_Trsf2d(gp_Trsf const &)
But most of the time I get nothing and the shape doesn't change in the display.
Spent days in here:
https://cdn.rawgit.com/tpaviot/pythonocc-core/804f7f3/doc/apidoc/0.18.1/index.html
https://dev.opencascade.org/doc/refman/html/index.html
https://github.com/tpaviot/pythonocc-demos/tree/master/examples
So I know what functions to pass I think, but nothing seems to work out.
Maybe the display simply isn't showing me changes that are actually happening?
I asked a different PythonOCC question earlier (pythonOCC set default units to inches) but I think I'm really just missing something basic.
Can anyone think of why I'm not managing to make any real changes? Thanks for your time!
I used Open Cascade with C++ and BRepBuilderAPI_Transform(const TopoDS_Shape &S, const gp_Trsf &T, const Standard_Boolean Copy=Standard_False) is achieving transformation. See:
https://www.opencascade.com/doc/occt-6.9.1/refman/html/class_b_rep_builder_a_p_i___transform.html
This is how i used it:
gp_Trsf trsf;
trsf.SetTransformation(gp_Quaternion(gp_Mat(gp_XYZ(d.x1, d.y1, d.z1), gp_XYZ(d.x2, d.y2, d.z2), gp_XYZ(d.x1, d.y1, d.z1).Crossed(gp_XYZ(d.x2, d.y2, d.z2)))), gp_Vec(d.x, d.y, d.z));
*d.shape = BRepBuilderAPI_Transform(*d.shape, trsf, true);

`ValueError: A value in x_new is above the interpolation range.` - what other reasons than not ascending values?

I receive this error in scipy interp1d function. Normally, this error would be generated if the x was not monotonically increasing.
import scipy.interpolate as spi
def refine(coarsex,coarsey,step):
finex = np.arange(min(coarsex),max(coarsex)+step,step)
intfunc = spi.interp1d(coarsex, coarsey,axis=0)
finey = intfunc(finex)
return finex, finey
for num, tfile in enumerate(files):
tfile = tfile.dropna(how='any')
x = np.array(tfile['col1'])
y = np.array(tfile['col2'])
finex, finey = refine(x,y,0.01)
The code is correct, because it successfully worked on 6 data files and threw the error for the 7th. So there must be something wrong with the data. But as far as I can tell, the data increase all the way down.
I am sorry for not providing an example, because I am not able to reproduce the error on an example.
There are two things that could help me:
Some brainstorming - if the data are indeed monotonically
increasing, what else could produce this error? Another hint,
regarding the decimals, could be in this question, but I think
my solution (the min and max of x) is robust enough to avoid it. Or
isn't it?
Is it possible (how?) to return the value of x_new and
it's index when throwing the ValueError: A value in x_new is above the interpolation range. so that I could actually see where in the
file is the problem?
UPDATE
So the problem is that, for some reason, max(finex) is larger than max(coarsex) (one is .x39 and the other is .x4). I hoped rounding the original values to 2 significant digits would solve the problem, but it didn't, it displays fewer digits but still counts with the undisplayed. What can I do about it?
If you are running Scipy v. 0.17.0 or newer, then you can pass fill_value='extrapolate' to spi.interp1d, and it will extrapolate to accomadate these values of your's that lie outside the interpolation range. So define your interpolation function like so:
intfunc = spi.interp1d(coarsex, coarsey,axis=0, fill_value="extrapolate")
Be forewarned, however!
Depending on what your data looks like and the type on interpolation you are performing, the extrapolated values can be erroneous. This is especially true if you have noisy or non-monotonic data. In your case you might be ok because your x_new value is only slighly beyond your interpolation range.
Here's simple demonstration of how this feature can work nicely but also give erroneous results.
import scipy.interpolate as spi
import numpy as np
x = np.linspace(0,1,100)
y = x + np.random.randint(-1,1,100)/100
x_new = np.linspace(0,1.1,100)
intfunc = spi.interp1d(x,y,fill_value="extrapolate")
y_interp = intfunc(x_new)
import matplotlib.pyplot as plt
plt.plot(x_new,y_interp,'r', label='interp/extrap')
plt.plot(x,y, 'b--', label='data')
plt.legend()
plt.show()
So the interpolated portion (in red) worked well, but the extrapolated portion clearly fails to follow the otherwise linear trend in this data because of the noise. So have some understanding of your data and proceed with caution.
A quick test of your finex calc shows that it can (always?) gets into the extrapolation region.
In [124]: coarsex=np.random.rand(100)
In [125]: max(coarsex)
Out[125]: 0.97393109991816473
In [126]: step=.01;finex=np.arange(min(coarsex), max(coarsex)+step, step);(max(
...: finex),max(coarsex))
Out[126]: (0.98273730602114795, 0.97393109991816473)
In [127]: step=.001;finex=np.arange(min(coarsex), max(coarsex)+step, step);(max
...: (finex),max(coarsex))
Out[127]: (0.97473730602114794, 0.97393109991816473)
Again it is a quick test, and may be missing some critical step or value.

Scipy curve_fit fails for data with sine function

I'm trying to fit a curve through some data. The function I'm trying to fit is as follows:
def f(x,a,b,c):
return a +b*x**c
When using scipy.optimize.curve_fit I do not get any results: It returns the (default) initial parameters:
(array([ 1., 1., 1.]),
array([[ inf, inf, inf],
[ inf, inf, inf],
[ inf, inf, inf]]))
I've tried reproducing the data, and found that a sine function was causing the problem (the data contains daily variation):
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
xdata=np.random.rand(1000) + 0.002 *np.sin(np.arange(1000)/(1.5*np.pi))
ydata=0.1 + 23.4*xdata**0.56 + np.random.normal(0,2,1000)
def f(x,a,b,c):
return a +b*x**c
fit=curve_fit(f,xdata,ydata)
fig,ax=plt.subplots(1,1)
ax.plot(xdata,ydata,'k.',markersize=3)
ax.plot(np.arange(0,1,.01), f(np.arange(0,1,.01),*fit[0]))
fig.show()
I would obviously expect curve_fit to return something close to [0.1, 23.4, .56].
Note that the sine function does not really seem to affect the data ('xdata') in value, as the first term of xdata ranges between 0 and 1 and I'm adding something between -0.002 and +0.002, but it does cause the fitting procedure to fail. I found the value 0.002 to be close to the 'critical' value for failure; if it is smaller the procedure is less likely to fail, and vice versa. At 0.002 the procedure fails about as often as not.
I have tried solving this problem by shuffling the 'xdata' and 'ydata' simultaneously, to no effect. I thought (for no particular reason) that perhaps removing the autocorrelation of the data would solve the problem.
So my question is: how can I fix/bypass this problem? I can change the sine contribution in the synthetic data in the snippet above, but for my real data I obviously cannot.
You can eliminate the NaNs generated by negative x-values within in the model function:
def f(x,a,b,c):
y = a +b*x**c
y[np.isnan(y)] = 0.0
return y
Replacing all NaNs by 0 might not be the best choice. You could try neighbour values or do some kind of extrapolation.
If you feed in generated test data you have to make sure that there are no NaNs in there either. So directly after data generation put something like:
if xdata.min() < 0:
print 'expecting NaNs'
ydata[np.isnan(ydata)] = 0.0

Change spacing in Mayavi

I am creating a surf() plot using Mayavi/mlab but the resluting picture is not really satisfying since the spacing is not really good. Here is my Code:
import pygrib
from mayavi.mlab import *
from mayavi import mlab
grbs = pygrib.open("lfff00000000c_1h.grb")
data = grbs.select(name='Geometric Height of the earths surface above sea level')[0].values
# --> data is a simple 2D array
mlab.figure(1, fgcolor=(0,0,0), bgcolor=(1,1,1))
s = surf(data, colormap='gist_earth')
mlab.title("geom. height", size = 0.5)
So actually i want to increase the spacing for the x and y axis in the resulting picture. But i don't know how to do this. I know that I somehow have to work with array_source.spacing = array([ 5., 5., 1.]) in my Python Code but i don't know how? :(
Actually i figured out what solves my problem:
I simply added warp_scale to my surf() function. In this way the z-scale is influenced and since I was only interested in changing the x and y-axis in the same way this solves my problem.
s = surf(data, colormap='gist_earth', warp_scale=0.05)
Perhaps this helps other people with the same issue.

Categories