Too many arguments used by python scipy.optimize.curve_fit - python

I'm attempting to do some curve fitting within a class instance method, and the curve_fit function is giving my class instance method too many arguments.
The code is
class HeatData(hx.HX):
"""Class for handling data from heat exchanger experiments."""
then several lines of methods that work fine, then my function is:
def get_flow(pressure_drop, coeff):
"""Sets flow based on coefficient and pressure drop."""
flow = coeff * pressure_drop**0.5
return flow
and the curve_fit function call
def set_flow_array(self):
"""Sets experimental flow rate through heat exchanger"""
flow = self.flow_data.flow
pressure_drop = self.flow_data.pressure_drop
popt, pcov = spopt.curve_fit(self.get_flow, pressure_drop, flow)
self.exh.flow_coeff = popt
self.exh.flow_array = ( self.exh.flow_coeff * self.exh.pressure_drop**0.5 )
gives the error
get_flow() takes exactly 2 arguments (3 given)
I can make it work by defining get_flow outside of the class and calling it like this:
spopt.curve_fit(get_flow, pressure_drop, flow)
but that's no good because it really needs to be a method within the class to be as versatile as I want. How can I get this work as a class instance method?
I'd also like to be able to pass self to get_flow to give it more parameters that are not fit parameters used by curve_fit. Is this possible?

Unlucky case, and maybe a bug in curve_fit. curve_fit uses inspect to determine the number of starting values, which gets confused or misled if there is an extra self.
So giving a starting value should avoid the problem, I thought. However, there is also an isscalar(p0) in the condition, I have no idea why, and I think it would be good to report it as a problem or bug:
if p0 is None or isscalar(p0):
# determine number of parameters by inspecting the function
import inspect
args, varargs, varkw, defaults = inspect.getargspec(f)
edit: avoiding the scalar as starting value
>>> np.isscalar([2])
False
means that the example with only 1 parameter works if the starting value is defined as [...], e.g.similar to example below:
mc.optimize([2])
An example with two arguments and a given starting value avoids the inspect call, and everything is fine:
import numpy as np
from scipy.optimize import curve_fit
class MyClass(object):
def get_flow(self, pressure_drop, coeff, coeff2):
"""Sets flow based on coefficient and pressure drop."""
flow = coeff * pressure_drop**0.5 + coeff2
return flow
def optimize(self, start_value=None):
coeff = 1
pressure_drop = np.arange(20.)
flow = coeff * pressure_drop**0.5 + np.random.randn(20)
return curve_fit(self.get_flow, pressure_drop, flow, p0=start_value)
mc = MyClass()
print mc.optimize([2,1])
import inspect
args, varargs, varkw, defaults = inspect.getargspec(mc.get_flow)
print args, len(args)
EDIT: This bug has been fixed so bound methods can now be passed as the first argument for curve_fit, if you have a sufficiently new version of scipy.
Commit of bug fix submission on github

If you define get_flow inside your HeatData class you'll have to have self as first parameter : def get_flow(self, pressure_drop, coeff):
EDIT: after seeking for the definition of curve_fit, i found that the prototype is curve_fit(f, xdata, ydata, p0=None, sigma=None, **kw) so the first arg must be a callable that will be called with first argument as the independent variable :
Try with a closure :
def set_flow_array(self):
"""Sets experimental flow rate through heat exchanger"""
flow = self.flow_data.flow
pressure_drop = self.flow_data.pressure_drop
def get_flow((pressure_drop, coeff):
"""Sets flow based on coefficient and pressure drop."""
#here you can use self.what_you_need
# you can even call a self.get_flow(pressure_drop, coeff) method :)
flow = coeff * pressure_drop**0.5
return flow
popt, pcov = spopt.curve_fit(get_flow, pressure_drop, flow)
self.exh.flow_coeff = popt
self.exh.flow_array = ( self.exh.flow_coeff * self.exh.pressure_drop**0.5 )

Trying dropping the "self" and making the call: spopt.curve_fit(get_flow, pressure_drop, flow)

The first argument of a class method definition should always be self. That gets passed automatically and refers to the calling class, so the method always receives one more argument than you pass when calling it.

The only pythonic way to deal with this is to let Python know get_flow is a staticmethod: a function that you put in the class because conceptually it belongs there but it doesn't need to be, and therefore doesn't need self.
#staticmethod
def get_flow(pressure_drop, coeff):
"""Sets flow based on coefficient and pressure drop."""
flow = coeff * pressure_drop**0.5
return flow
staticmethod's can be recognized by the fact that self is not used in the function.

Related

ODE with non-analytical time-dependent parameters in PyMC3

I'm working on solving the following ODE with PyMC3:
def production( y, t, p ):
return p[0]*getBeam( t ) - p[1]*y[0]
The getBeam( t ) is my time dependent coefficient. Those coefficients are given by an array of data which is accessed by the time index as follows:
def getBeam( t ):
nBeam = I[int(t/10)]*pow( 10, -6 )/q_e
return nBeam
I have successfully implemented it by using the scipy.integrate.odeint, but I have hard time to do it with pymc3.ode. In fact, by using the following:
ode_model = DifferentialEquation(func=production, times=x, n_states=1, n_theta=3, t0=0)
with pm.Model() as model:
a = pm.Uniform( "S-Factor", lower=0.01, upper=100 )
ode_solution = ode_model(y0=[0], theta=[a, Yield, lambd])
I obviously get the error TypeError: __trunc__ returned non-Integral (type TensorVariable), as the t is a TensorVariable, thus can not be used to access the array in which the coefficients are stored.
Is there a way to overcome this difficulty? I thought about using the theano.function but I can not get it working since, unfortunately, the coefficients can not be expressed by any analytical function: they are just stored inside the array I which index represents the time variable t.
Thanks
Since you already have a working implementation with scipy.integrate.odeint, you could use theano.compile.ops.as_op, though it comes with some inconveniences (see how to fit a method belonging to an instance with pymc3? and How to write a custom Deterministic or Stochastic in pymc3 with theano.op?)
Using your exact definitions for production and getBeam, the following code seems to work for me:
from scipy.integrate import odeint
from theano.compile.ops import as_op
import theano.tensor as tt
import pymc3 as pm
def ScipySolveODE(a):
return odeint(production, y0=[0], t=x, args=([a, Yield, lambd],)).flatten()
#as_op(itypes=[tt.dscalar], otypes=[tt.dvector])
def TheanoSolveODE(a):
return ScipySolveODE(a)
with pm.Model() as model:
a = pm.Uniform( "S-Factor", lower=0.01, upper=100 )
ode_solution = TheanoSolveODE(a)
Sorry I know this is more of a workaround than an actual solution...

Problem with integral calculation integral with OOP in Python

First, i would like to calculate the integral
and then i' d like to plot a function F(x) but i have the following error:
F() missing 1 required positional argument: 't'
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import quad
x=np.arange(-20,20,0.5)
class NLA():
def __init__(self,b=-20*10**-15,E=200*10**-9,w0=18*10**-6,t=50*10**-15,s=1.76,A=0.2,L=0.1):
self.b=b
self.E=E
self.w0=w0
self.t=t
self.s=s
self.A=A
self.L=L
self.I0=self.s*self.E/(self.t*np.pi*self.w0**2)
self.a0=(1/self.L)*np.log(10)*self.A
self.Leff=0.01*(1-np.exp(-self.L*self.a0))/self.a0
self.c=self.b*self.I0*self.Leff
def f(self,t,x):
return np.log(1+((self.c*np.exp(-t**2))/(1+self.x**2)))
def F(self,x,t):
return ((1+x**2)/(self.c*np.pi**(1/2)))*quad(self.f(t),-np.inf,np.inf,args=(x))[0]
nla=NLA()
T=nla.F(x)
def F(self,x,t):
return ((1+x**2)/(self.c*np.pi**(1/2)))*quad(self.f(t),-np.inf,np.inf,args=(x))[0]
change to
def F(self,x):
temp = quad(self.f,-np.inf,np.inf,args=(x,))[0]
return ((1+x**2)/(self.c*np.pi**(1/2)))*temp
quad is supposed to get a function, in this case self.f. quad will call it with the integration variable, t and the args tuple.
I split of temp, so the quad calls is easier to see (and if needed to debug).
F should not take t as an argument.
When using functions like quad, follow the documentation carefully. If anything is confusing, practice with its examples. I don't think the OOP is causing problems, except that all those "self" may be confusing you, and obscuring the basic quad calls.

Python / GPyOpt: Optimizing only one argument

I´m currently trying to find the minimum of some function f(arg1, arg2, arg3, ...) via Gaussian optimization using the GPyOpt module. While f(...) takes many input arguments, I only want to optimize a single one of them. How do you do that?
My current "solution" is to put f(...) in a dummy class and specify the not-to-be-optimized arguments while initializing it. While this is arguably the most pythonesque way of solving this problem, it`s also way more complicated than it has any right to be.
Short working example for a function f(x, y, method) with fixed y (a numeric) and method (a string) while optimizing x:
import GPyOpt
import numpy as np
# dummy class
class TarFun(object):
# fix y while initializing the object
def __init__(self, y, method):
self.y = y
self.method = method
# actual function to be minimized
def f(self, x):
if self.method == 'sin':
return np.sin(x-self.y)
elif self.method == 'cos':
return np.cos(x-self.y)
# create TarFun object with y fixed to 2 and use 'sin' method
tarFunObj = TarFun(y=2, method='sin')
# describe properties of x
space = [{'name':'x', 'type': 'continuous', 'domain': (-5,5)}]
# create GPyOpt object that will only optimize x
optObj = GPyOpt.methods.BayesianOptimization(tarFunObj.f, space)
There definitely has to be a simpler way. But all the examples I found optimize all arguments and I couldn't figure it out reading the code on github (I though i would find the information in GPyOpt.core.task.space , but had no luck).
GPyOpt supports this natively with context. You describe the whole domain of your function, and then fix values of some of the variables with a context dictionary when calling optimization routine. API looks like that:
myBopt.run_optimization(..., context={'var1': .3, 'var2': 0.4})
More details can be found in this tutorial notebook about contextual optimization.
I would check out the partial function from the functools standard library. It allows you to partially specify a function, so for example:
import GPyOpt
import numpy as np
from functools import partial
def f(x, y=0):
return np.sin(x - y)
objective = partial(f, y=2)
space = [{'name': 'x', 'type': 'continuous', 'domain': (-5, 5)}]
opt = GPyOpt.methods.BayesianOptimization(
objective, domain=space
)

How to write a custom Deterministic or Stochastic in pymc3 with theano.op?

I'm doing some pymc3 and I would like to create custom Stochastics, however there doesn't seem to be a lot documentation about how it's done. I know how to use the as_op way, however apparently that makes it impossible to use the NUTS sampler, in which case I don't see the advantage of pymc3 over pymc.
The tutorial mentions that it can be done by inheriting from theano.Op. But can anyone show me how that would work (I'm still getting started on theano)? I have two Stochastics that I want to define.
The first one should be easier, it's an N dimension vector F that has only constant parent variables:
with myModel:
F = DensityDist('F', lambda value: pymc.skew_normal_like(value, F_mu_array, F_std_array, F_a_array), shape = N)
I want a skew normal distribution, which doesn't seem to be implemented in pymc3 yet, I just imported the pymc2 version. Unfortunately, F_mu_array, F_std_array, F_a_array and F are all N-dimensional vectors, and the lambda thing doesn't seem to work with an N-dimension list value.
Firstly, is there a way to make the lambda input an N-dimensional array? If not, I guess I would need to define the Stochastic F directly, and this is where I presume I need theano.Op to make it work.
The second example is a more complicated function of other Stochastics. Here how I want to define it (incorrectly at the moment):
with myModel:
ln2_var = Uniform('ln2_var', lower=-10, upper=4)
sigma = Deterministic('sigma', exp(0.5*ln2_var))
A = Uniform('A', lower=-10, upper=10, shape=5)
C = Uniform('C', lower=0.0, upper=2.0, shape=5)
sw = Normal('sw', mu=5.5, sd=0.5, shape=5)
# F from before
F = DensityDist('F', lambda value: skew_normal_like(value, F_mu_array, F_std_array, F_a_array), shape = N)
M = Normal('M', mu=M_obs_array, sd=M_stdev, shape=N)
# Radius forward-model (THIS IS THE STOCHASTIC IN QUESTION)
R = Normal('R', mu = R_forward(F, M, A, C, sw, N), sd=sigma, shape=N)
Where the function R_forward(F,M,A,C,sw,N) is naively defined as:
from theano.tensor import lt, le, eq, gt, ge
def R_forward(Flux, Mass, A, C, sw, num):
for i in range(num):
if lt(Mass[i], 0.2):
if lt(Flux[i], sw[0]):
muR = C[0]
else:
muR = A[0]*log10(Flux[i]) + C[0] - A[0]*log10(sw[0])
elif (le(0.2, Mass[i]) or le(Mass[i], 0.5)):
if lt(Flux[i], sw[1]):
muR = C[1]
else:
muR = A[1]*log10(Flux[i]) + C[1] - A[1]*log10(sw[1])
elif (le(0.5, Mass[i]) or le(Mass[i], 1.5)):
if lt(Flux[i], sw[2]):
muR = C[2]
else:
muR = A[2]*log10(Flux[i]) + C[2] - A[2]*log10(sw[2])
elif (le(1.5, Mass[i]) or le(Mass[i], 3.5)):
if lt(Flux[i], sw[3]):
muR = C[3]
else:
muR = A[3]*log10(Flux[i]) + C[3] - A[3]*log10(sw[3])
else:
if lt(Flux[i], sw[4]):
muR = C[4]
else:
muR = A[4]*log10(Flux[i]) + C[4] - A[4]*log10(sw[4])
return muR
This presumably won't work of course. I can see how I would use as_op, but I want to preserve the NUTS sampling.
I realize this is a bit late now, but I thought I'd answer the question (rather vaguely) anyways.
If you want to define a stochastic function (e.g. a probability distribution), then you need to do a couple of things:
First, define a subclass of either Discrete (pymc3.distributions.Discrete) or Continuous, which has at least the method logp, which returns the log-likelihood of your stochastic. If you define this as a simple symbolic equation (x+1), I believe you do not need to take care of any gradients (but don't quote me on this; see the documentation about this). I'll get on to more complicated cases below. In the unfortunate case that you need to do anything more complex, as in your second example (pymc3 now has a skew normal distribution implemented, by the way), you need to define the operations required for it (used in the logp method) as a Theano Op. If you need no derivatives, then the as_op does the job, but as you said, gradients are kind of the idea of pymc3.
This is where it gets complicated. If you want to use NUTS (or need gradients for whatever reason), then you need to implement your operation used in logp as a subclass of theano.gof.Op. Your new op class (let's call it just Op from now on) will need two or three methods at least. The first one defines inputs/outputs to the Op (check the Op documentation). The perform() method (or variants you might choose) is the one that does the operation you want (your R_forward function, for example). This can be done in pure python, if you so wish. The third method, grad(), is where you define the gradient of your perform()'s output wrt the inputs. The actual output to grad() is a bit different, but not a big deal.
And it is in grad() that using Theano pays off. If you define your entire perform() in Theano, then it might be that you can easily use automatic differentiation (theano.tensor.grad or theano.tensor.jacobian) to do the work for you (see the example below). However, this is not necessarily going to be easy.
In your second example, it would mean implementing your R_forward function in Theano, which could be complicated.
Here I include a somewhat minimal example of an Op that I created while learning to do these things.
def my_th_fun():
""" Some needed auxiliary functions.
"""
X = th.tensor.vector('X')
SCALE = th.tensor.scalar('SCALE')
X.tag.test_value = np.array([1,2,3,4])
SCALE.tag.test_value = 5.
Scale, upd_sm_X = th.scan(lambda x, scale: scale*(scale+ x),
sequences=[X],
outputs_info=[SCALE])
fun_Scale = th.function(inputs=[X, SCALE], outputs=Scale)
D_out_d_scale = th.tensor.grad(Scale[-1], SCALE)
fun_d_out_d_scale = th.function([X, SCALE], D_out_d_scale)
return Scale, fun_Scale, D_out_d_scale, fun_d_out_d_scale
class myOp(th.gof.Op):
""" Op subclass with a somewhat silly computation. It uses
th.scan and th.tensor.grad is used to calculate the gradient
automagically in the grad() method.
"""
__props__ = ()
itypes = [th.tensor.dscalar]
otypes = [th.tensor.dvector]
def __init__(self, *args, **kwargs):
super(myOp, self).__init__(*args, **kwargs)
self.base_dist = np.arange(1,5)
(self.UPD_scale, self.fun_scale,
self.D_out_d_scale, self.fun_d_out_d_scale)= my_th_fun()
def perform(self, node, inputs, outputs):
scale = inputs[0]
updated_scale = self.fun_scale(self.base_dist, scale)
out1 = self.base_dist[0:2].sum()
out2 = self.base_dist[2:4].sum()
maxout = np.max([out1, out2])
exp_out1 = np.exp(updated_scale[-1]*(out1-maxout))
exp_out2 = np.exp(updated_scale[-1]*(out2-maxout))
norm_const = exp_out1 + exp_out2
outputs[0][0] = np.array([exp_out1/norm_const, exp_out2/norm_const])
def grad(self, inputs, output_gradients): #working!
""" Calculates the gradient of the output of the Op wrt
to the input. As a simple example, the input is scalar.
Notice how the output is actually the gradient multiplied
by the output_gradients, which is an input provided by
theano when calculating gradients.
"""
scale = inputs[0]
X = th.tensor.as_tensor(self.base_dist)
# Do I need to recalculate all this or can I assume that perform() has
# always been called before grad() and thus can take it from there?
# In any case, this is a small enough example to recalculate quickly:
all_scale, _ = th.scan(lambda x, scale_1: scale_1*(scale_1+ x),
sequences=[X],
outputs_info=[scale])
updated_scale = all_scale[-1]
out1 = self.base_dist[0:1].sum()
out2 = self.base_dist[2:3].sum()
maxout = np.max([out1, out2])
exp_out1 = th.tensor.exp(updated_scale*(out1 - maxout))
exp_out2 = th.tensor.exp(updated_scale*(out2 - maxout))
norm_const = exp_out1 + exp_out2
d_S_d_scale = th.theano.grad(all_scale[-1], scale)
Jac1 = (-(out1-out2)*d_S_d_scale*
th.tensor.exp(updated_scale*(out1+out2 - 2*maxout))/(norm_const**2))
Jac2 = -Jac1
return Jac1*output_gradients[0][0]+ Jac2*output_gradients[0][1],
This Op can then be used inside the logp() method of a stochastic in pymc3:
import pymc3 as pm
class myDist(pm.distributions.Discrete):
def __init__(self, invT, *args, **kwargs):
super(myDist, self).__init__(*args, **kwargs)
self.invT = invT
self.myOp = myOp()
def logp(self, value):
return self.myOp(self.invT)[value]
I hope it helps any (hopeless) pymc3/theano newbie out there.

Using scipy.optimize.curve_fit within a class

I have a class describing a mathematical function. The class needs to be able to least squares fit itself to passed in data. i.e. you can call a method like this:
classinstance.Fit(x,y)
and it adjusts its internal variables to best fit the data. I'm trying to use scipy.optimize.curve_fit for this, and it needs me to pass in a model function. The problem is that the model function is within the class and needs to access the variables and members of the class to compute the data. However, curve_fit can't call a function whose first parameter is self. Is there a way to make curve_fit use a method of the class as it's model function?
Here is a minimum executable snippet to show the issue:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# This is a class which encapsulates a gaussian and fits itself to data.
class GaussianComponent():
# This is a formula string showing the exact code used to produce the gaussian. I
# It has to be printed for the user, and it can be used to compute values.
Formula = 'self.Amp*np.exp(-((x-self.Center)**2/(self.FWHM**2*np.sqrt(2))))'
# These parameters describe the gaussian.
Center = 0
Amp = 1
FWHM = 1
# HERE IS THE CONUNDRUM: IF I LEAVE SELF IN THE DECLARATION, CURVE_FIT
# CANNOT CALL IT SINCE IT REQUIRES THE WRONG NUMBER OF PARAMETERS.
# IF I REMOVE IT, FITFUNC CAN'T ACCESS THE CLASS VARIABLES.
def FitFunc(self, x, y, Center, Amp, FWHM):
eval('y - ' + self.Formula.replace('self.', ''))
# This uses curve_fit to adjust the gaussian parameters to best match the
# data passed in.
def Fit(self, x, y):
#FitFunc = lambda x, y, Center, Amp, FWHM: eval('y - ' + self.Formula.replace('self.', ''))
FitParams, FitCov = curve_fit(self.FitFunc, x, y, (self.Center, self.Amp, self.FWHM))
self.Center = FitParams[0]
self.Amp = FitParams[1]
self.FWHM = FitParams[2]
# Give back a vector which describes what this gaussian looks like.
def GetPlot(self, x):
y = eval(self.Formula)
return y
# Make a gausssian with default shape and position (height 1 at the origin, FWHM 1.
g = GaussianComponent()
# Make a space in which we can plot the gaussian.
x = np.linspace(-5,5,100)
y = g.GetPlot(x)
# Make some "experimental data" which is just the default shape, noisy, and
# moved up the y axis a tad so the best fit will be different.
ynoise = y + np.random.normal(loc=0.1, scale=0.1, size=len(x))
# Draw it
plt.plot(x,y, x,ynoise)
plt.show()
# Do the fit (but this doesn't work...)
g.Fit(x,y)
And this produces the following graph and then crashes since the model function is incorrect when it tries to do the fit.
Thanks in advance!
I spent some time looking at your code and turned out 2 minutes late unfortunately. Anyhow, to make things a bit more interesting I've edited your class a bit, Here's what I concocted:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
class GaussianComponent():
def __init__(self, func, params=None):
self.formula = func
self.params = params
def eval(self, x):
allowed_locals = {key: self.params[key] for key in self.params}
allowed_locals["x"] = x
allowed_globals = {"np":np}
return eval(self.formula, allowed_globals, allowed_locals)
def Fit(self, x, y):
FitParams, FitCov = curve_fit(self.eval, x, y, self.params)
self.fitparams = fitParams
# Make a gausssian with default shape and position (height 1 at the origin, FWHM 1.
g = GaussianComponent("Amp*np.exp(-((x-Center)**2/(FWHM**2*np.sqrt(2))))",
params={"Amp":1, "Center":0, "FWHM":1})
**SNIPPED FOR BREVITY**
I believe you'll perhaps find this a more satisfying solution?
Currently all your gauss parameters are class attributes, that means if you try to create a second instance of your class with different values for parameters you will change the values for the first class too. By shoving all the parameters as instance attribute(s), you get rid of that. That is why we have classes in the first place.
Your issue with self stems from the fact you write self in your Formula. Now you don't have to any more. I think it makes a bit more sense like this, because when you instantiate an object of the class you can add as many or as little params to your declared function as you want. It doesn't even have to be gaussian now (unlike before).
Just throw all params to a dictionary, just like curve_fit does and forget about them.
By explicitly stating what eval can use, you help make sure that any evil-doers have a harder time breaking your code. It's still possible though, it always is with eval.
Good luck, ask if you need something clarified.
Ahh! It was actually a bug in my code. If I change this line:
def FitFunc(self, x, y, Center, Amp, FWHM):
to
def FitFunc(self, x, Center, Amp, FWHM):
Then we are fine. So curve_fit does correctly handle the self parameter but my model function shouldn't include y.
(Embarrased!)

Categories