polyfit refining: setting polynomial to be always possitive - python

I am trying to fit a polynomial to my data, e.g.
import scipy as sp
x = [1,6,9,17,23,28]
y = [6.1, 7.52324, 5.71, 5.86105, 6.3, 5.2]
and say I know the degree of polynomial (e.g.: 3), then I just use scipy.polyfit method to get the polynomial of a given degree:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
fittedModelFunction = sp.polyfit(x, y, 3)
func = sp.poly1d(fittedModelFunction)
++++++++++++++++++++++++++++++
QUESTIONS: ++++++++++++++++++++++++++++++
1) How can I tell in addition that the resulting function func must be always positive (i.e. f(x) >= 0 for any x)?
2) How can I further define a constraint (e.g. number of (local) min and max points, etc.) in order to get a better fitting?
Is there smth like this:
http://mail.scipy.org/pipermail/scipy-user/2007-July/013138.html
but more accurate?

Always Positve
I haven't been able to find a scipy reference that determines if a function is positive-definite, but an indirect way would be to find the all the roots - Scipy Roots - of the function and inspect the limits near those roots. There are a few cases to consider:
No roots at all
Pick any x and evaluate the function. Since the function does not cross the x-axis because of a lack of roots, any positive result will indicate the function is positive!
Finite number of roots
This is probably the most likely case. You would have to inspect the limits before and after each root - Scipy Limits. You would have to specify your own minimum acceptable delta for the limit however. I haven't seen a 2-sided limit method provided by Scipy, but it looks simple enough to make your own.
from sympy import limit
// f: function, v: variable to limit, p: point, d: delta
// returns two limit values
def twoSidedLimit(f, v, p, d):
return limit(f, v, p-d), limit(f, v, p+d)
Infinite roots
I don't think that polyfit would generate an oscillating function, but this is something to consider. I don't know how to handle this with the method I have already offered... Um, hope it does not happen?
Constraints
The only built-in form of constraints seems to be limited to the optimize library of SciPy. A crude way to enforce constraints for polyfit would be to get the function from polyfit, generate a vector of values for various x, and try to select values from the vector that violate the constraint. If you try to use filter, map, or lambda it may be slow with large vectors since python's filter makes a copy of the list/vector being filtered. I can't really help in this regard.

Related

How to solve a delay differential equation numerically

I would like to compute the Buchstab function numerically. It is defined by the delay differential equation:
How can I compute this numerically efficiently?
To get a general feeling of how DDE integration works, I'll give some code, based on the low-order Heun method (to avoid uninteresting details while still being marginally useful).
In the numerical integration the previous values are treated as a function of time like any other time-depending term. As there is not really a functional expression for it, the solution so-far will be used as a function table for interpolation. The interpolation error order should be as high as the error order of the ODE integrator, which is easy to arrange for low-order methods, but will require extra effort for higher order methods. The solve_ivp stepper classes provide such a "dense output" interpolation per step that can be assembled into a function for the currently existing integration interval.
So after the theory the praxis. Select step size h=0.05, convert the given history function into the start of the solution function table
u=1
u_arr = []
w_arr = []
while u<2+0.5*h:
u_arr.append(u)
w_arr.append(1/u)
u += h
Then solve the equation, for the delayed value use interpolation in the function table, here using numpy.interp. There are other functions with more options in `scipy.interpolate.
Note that h needs to be smaller than the smallest delay, so that the delayed values are from a previous step. Which is the case here.
u = u_arr[-1]
w = w_arr[-1]
while u < 4:
k1 = (-w + np.interp(u-1,u_arr,w_arr))/u
us, ws = u+h, w+h*k1
k2 = (-ws + np.interp(us-1,u_arr,w_arr))/us
u,w = us, w+0.5*h*(k1+k2)
u_arr.append(us)
w_arr.append(ws)
Now the numerical approximation can be further processed, for instance plotted.
plt.plot(u_arr,w_arr); plt.grid(); plt.show()

Fitting with funtional parameter constraints in Python

I have some data {x_i,y_i} and I want to fit a model function y=f(x,a,b,c) to find the best fitting values of the parameters (a,b,c); however, the three of them are not totally independent but constraints to 1<b , 0<=c<1 and g(a,b,c)>0, where g is a "good" function. How could I implement this in Python since with curve_fit one cannot put the parametric constraints directly?
I have been reading with lmfit but I only see numerical constraints like 1<b, 0<=c<1 and not the one with g(a,b,c)>0, which is the most important.
If I understand correctly, you have
def g(a,b,c):
c1 = (1.0 - c)
cx = 1/c1
c2 = 2*c1
g = a*a*b*gamma(2+cx)*gamma(cx)/gamma(1+3/c2)-b*b/(1+b**c2)**(1/c2)
return g
If so, and if get the math right, this could be represented as
a = sqrt((g+b*b/(1+b**c2)**(1/c2))*gamma(1+3/c2)/(b*gamma(2+cx)*gamma(cx)))
Which is to say that you could think about your problem as having a variable g which is > 0 and a value for a derived from b, c, and g by the above expression.
And that you can do with lmfit and its expression-based constraint mechanism. You would have to add the gamma function, as with
from lmfit import Parameters
from scipy.special import gamma
params = Parameters()
params._asteval.symtable['gamma'] = gamma
and then set up the parameters with bounds and constraints. I would probably follow the math above to allow better debugging and use something like:
params.add('b', 1.5, min=1)
params.add('c', 0.4, min=0, max=1)
params.add('g', 0.2, min=0)
params.add('c1', expr='1-c')
params.add('cx', expr='1.0/c1')
params.add('c2', expr='2*c1')
params.add('gprod', expr='b*gamma(2+cx)*gamma(cx)/gamma(1+3/c2)')
params.add('bfact', expr='(1+b**c2)**(1/c2)')
params.add('a', expr='sqrt(g+b*b/(bfact*gprod))')
Note that this gives 3 actual variables (now g, b, and c) with plenty of derived values calculated from these, including a. I would certainly check all that math. It looks like you're safe from negative**fractional_power, sqrt(negitive), and gamma(-1), but be aware of these possibilities that will kill the fit.
You could embed all of that into your fitting function, but using constraint expressions gives you the ability to constrain parameter values independently of how the fitting or model function is defined.
Hope that helps. Again, if this does not get to what you are trying to do, post more details about the constraint you are trying to impose.
Like James Phillips, I was going to suggest SciPy's curve_fit. But the way that you have defined your function, one of the constraints is on the function itself, and SciPy's bounds are defined only in terms of input variables.
What, exactly, are the forms of your functions? Can you transform them so that you can use a standard definition of bounds, and then reverse the transformation to give a function in the original form that you wanted?
I have encountered a related problem when trying to fit exponential regressions using SciPy's curve_fit. The parameter search algorithms vary in a linear fashion, and it's really easy to fail to establish a gradient. If I write a function which fits the logarithm of the function I want, it's much easier to make curve_fit work. Then, for my final work, I take the exponent of my fitted function.
This same strategy could work for you. Predict ln(y). The value of that function can be unbounded. Then for your final result, output exp(ln(y)) = y.

Find global minimum using scipy.optimize.minimize

Given a 2D point p, I'm trying to calculate the smallest distance between that point and a functional curve, i.e., find the point on the curve which gives me the smallest distance to p, and then calculate that distance. The example function that I'm using is
f(x) = 2*sin(x)
My distance function for the distance between some point p and a provided function is
def dist(p, x, func):
x = np.append(x, func(x))
return sum([[i - j]**2 for i,j in zip(x,p)])
It takes as input, the point p, a position x on the function, and the function handle func. Note this is a squared Euclidean distance (since minimizing in Euclidean space is the same as minimizing in squared Euclidean space).
The crucial part of this is that I want to be able to provide bounds for my function so really I'm finding the closest distance to a function segment. For this example my bounds are
bounds = [0, 2*np.pi]
I'm using the scipy.optimize.minimize function to minimize my distance function, using the bounds. A result of the above process is shown in the graph below.
This is a contour plot showing distance from the sin function. Notice how there appears to be a discontinuity in the contours. For convenience, I've plotted a few points around that discontinuity and the "closet" points on the curve that they map to.
What's actually happening here is that the scipy function is finding a local minimum (given some initial guess), but not a global one and that is causing the discontinuity. I know finding the global minimum of any function is impossible, but I'm looking for a more reliable way to find the global minimum.
Possible methods for finding a global minimum would be
Choose a smart initial guess, but this amounts to knowing approximately where the global minimum is to begin with, which is using the solution of the problem to solve it.
Use a multiple initial guesses and choose the answer which gets to the best minimum. This however seems like a poor choice, especially when my functions get more complicated (and higher dimensional).
Find the minimum, then perturb the solution and find the minimum again, hoping that I may have knocked it into a better minimum. I'm hoping that maybe there is some way to do this simply without evoking some complicated MCMC algorithm or something like that. Speed counts for this process.
Any suggestions about the best way to go about this, or possibly directions to useful functions that may tackle this problem would be great!
As suggest in a comment, you could try a global optimization algorithm such as scipy.optimize.differential_evolution. However, in this case, where you have a well-defined and analytically tractable objective function, you could employ a semi-analytical approach, taking advantage of the first-order necessary conditions for a minimum.
In the following, the first function is the distance metric and the second function is (the numerator of) its derivative w.r.t. x, that should be zero if a minimum occurs at some 0<x<2*np.pi.
import numpy as np
def d(x, p):
return np.sum((p-np.array([x,2*np.sin(x)]))**2)
def diff_d(x, p):
return -2 * p[0] + 2 * x - 4 * p[1] * np.cos(x) + 4 * np.sin(2*x)
Now, given a point p, the only potential minimizers of d(x,p) are the roots of diff_d(x,p) (if any), as well as the boundary points x=0 and x=2*np.pi. It turns out that diff_d may have more than one root. Noting that the derivative is a continuous function, the pychebfun library offers a very efficient method for finding all the roots, avoiding cumbersome approaches based on the scipy root-finding algorithms.
The following function provides the minimum of d(x, p) for a given point p:
import pychebfun
def min_dist(p):
f_cheb = pychebfun.Chebfun.from_function(lambda x: diff_d(x, p), domain = (0,2*np.pi))
potential_minimizers = np.r_[0, f_cheb.roots(), 2*np.pi]
return np.min([d(x, p) for x in potential_minimizers])
Here is the result:

scipy integrate over array with variable bounds

I am trying to integrate a function over a list of point and pass the whole array to an integration function in order ot vectorize the thing. For starters, calling scipy.integrate.quad is way too slow since I have something like 10 000 000 points to integrate. Using scipy.integrate.romberg does the trick much faster, almost instantaneous while quad is slow since you must loop over it or vectorize it.
My function is quite complicated, but for demonstation purpose, let's say I want to integrate x^2 from a to b, but x is an array of scalar to evaluate x. For example
import numpy as np
from scipy.integrate import quad, romberg
def integrand(x, y):
return x**2 + y**2
quad(integrand, 0, 10, args=(10) # this fails since y is not a scalar
romberg(integrand, 0, 10) # y works here, giving the integral over
# the entire range
But this only work for fixed bounds. Is there a way to do something like
z = np.arange(20,30)
romberg(integrand, 0, z) # Fails since the function doesn't seem to
# support variable bounds
Only way I see it is to re-implement the algorithm itself in numpy and use that instead so I can have variable bounds. Any function that supports something like this? There is also romb, where you must supply the values of integrand directly and a dx interval, but that will be too imprecise for my complicated function (the marcum Q function, couldn't find any implementation, that could be another way to dot it).
The best approach when trying to evaluate a special function is to write a function that uses the properties of the function to quickly and accurately evaluate it in all parameter regimes. It is quite unlikely that a single approach will give accurate (or even stable) results for all ranges of parameters. Direct evaluation of an integral, as in this case, will almost certainly break down in many cases.
That being said, the general problem of evaluating an integral over many ranges can be solved by turning the integral into a differential equation and solving that. Roughly, the steps would be
Given an integral I(t) which I will assume is an integral of a function f(x) from 0 to t [this can be generalized to an arbitrary lower limit], write it as the differential equation dI/dt = f(x).
Solve this differential equation using scipy.integrate.odeint() for some initial conditions (here I(0)) over some range of times from 0 to t. This range should contain all limits of interest. How finely this is sampled depends on the function and how accurately it needs to be evaluated.
The result will be the value of the integral from 0 to t for the set of t we input. We can turn this into a "continuous" function using interpolation. For example, using a spline we can define i = scipy.interpolate.InterpolatedUnivariateSpline(t,I).
Given a set of upper and lower limits in arrays b and a, respectively, then we can evaluate them all at once as res=i(b)-i(a).
Whether this approach will work in your case will require you to carefully study it over your range of parameters. Also note that the Marcum Q function involves a semi-infinite integral. In principle this is not a problem, just transform the integral to one over a finite range. For example, consider the transformation x->1/x. There is no guarantee this approach will be numerically stable for your problem.

Derivative of an array in python?

Currently I have two numpy arrays: x and y of the same size.
I would like to write a function (possibly calling numpy/scipy... functions if they exist):
def derivative(x, y, n = 1):
# something
return result
where result is a numpy array of the same size of x and containing the value of the n-th derivative of y regarding to x (I would like the derivative to be evaluated using several values of y in order to avoid non-smooth results).
This is not a simple problem, but there are a lot of methods that have been devised to handle it. One simple solution is to use finite difference methods. The command numpy.diff() uses finite differencing where you can specify the order of the derivative.
Wikipedia also has a page that lists the needed finite differencing coefficients for different derivatives of different accuracies. If the numpy function doesn't do what you want.
Depending on your application you can also use scipy.fftpack.diff which uses a completely different technique to do the same thing. Though your function needs a well defined Fourier transform.
There are lots and lots and lots of variants (e.g. summation by parts, finite differencing operators, or operators designed to preserve known evolution constants in your system of equations) on both of the two ideas above. What you should do will depend a great deal on what the problem is that you are trying to solve.
The good thing is that there is a lot of work has been done in this field. The Wikipedia page for Numerical Differentiation has some resources (though it is focused on finite differencing techniques).
The findiff project is a Python package that can do derivatives of arrays of any dimension with any desired accuracy order (of course depending on your hardware restrictions). It can handle arrays on uniform as well as non-uniform grids and also create generalizations of derivatives, i.e. general linear combinations of partial derivatives with constant and variable coefficients.
Would something like this solve your problem?
def get_inflection_points(arr, n=1):
"""
returns inflextion points from array
arr: array
n: n-th discrete difference
"""
inflections = []
dx = 0
for i, x in enumerate(np.diff(arr, n)):
if x >= dx and i > 0:
inflections.append(i*n)
dx = x
return inflections

Categories