I have a least squares minimization problem that has the following form
Where the parameters I want to optimize over are x and everything else is known.
scipy.optimize.least_squares has the following form:
scipy.optimize.least_squares(fun, x0)
where x0 is an initial condition and fun is a "Function which computes the vector of residuals"
After reading the documentation, I'm a little confused about what fun wants me to return.
If I do the summation inside fun, then I'm afraid that it would compute RHS, which is not equivalent to the LHS (...or is it, when it comes to minimization?)
Thanks for any assistance!
According to the documentation of scipy.optimize.least_squares, the argument fun is to provide the vector of residuals with which the process of minimization proceeds. It is possible to supply a scalar that is the result of summation of squared residuals, but it is also possible to supply a one-dimensional vector of shape (m,), where m is the number of dimensions of the residual function. Note that squaring and summation is not done in this instance as least_squares handles that detail on its own. Only the residuals as such must be supplied in this instance.
Related
I would like to know how I can represent the third derivate term:
In Fipy python. I know that the diffusion term is represented as
DiffusionTerm(coeff=D)
and higher order diffusion terms as
DiffusionTerm(coeff=(Gamma1, Gamma2))
But can not figure out a way to represent this third derivate. Thanks
Is the vector v defined in terms of a (scalar) solution variable? If not, just write the term explicitly:
v.divergence.faceGrad.divergence
If v is a function of the solution variable (say \phi), then there's no mechanism to do this like there is with higher-order diffusion, but there really isn't a need (nor is there a need for higher-order diffusion). Split your equation into two 2nd order PDEs and couple them:
\partial \phi / \partial t = \nabla^2 \nabla\cdot\vec{v}
can be rewritten as
\partial \phi / \partial t = \nabla^2 \psi \\
\psi = \nabla\cdot\vec{v}
which would be
TransientTerm(var=phi) == DiffusionTerm(var=psi)
ImplicitSourceTerm(var=psi) == ConvectionTerm(coeff=v, var=???)
I'd need to know more about v and your full set of equations to advise further on what that ConvectionTerm should look like.
[notes added given the information that these terms arise from the Korteweg-de Vries equation]:
While it is not strictly true that v isn't a function of some phi in the KdV equation, there still is no way to put the \partial^3 v / \partial x^3 term into a form that FiPy can readily make use of. If v is scalar, then \partial^3 v / \partial x^3 is vector. If v is vector, then \partial^3 v / \partial x^3 is either scalar or tensor. There's no way to make the rank of this term consistent with the others unless you dot it with a unit vector, in which case it's just some source without an efficient implicit representation.
At the root, 1D equations are always misleading. It's critical to know what's a scalar and what's a vector. FiPy, as a finite volume code, is applying the divergence theorem when it solves, and so it is necessary to know when one is dealing with the divergence of a flux (which FiPy can treat implicitly) or just some random partial derivative (which it cannot).
Reading through the derivations of the KdV equation, it appears that so many long-wave approximations and variable substitutions have been made that any trace of vector calculus has been cast away. As a result, this is not a PDE that FiPy has efficient forms for. You can write v.faceGrad.divergence.grad.dot([[1]]), and FiPy should accept this, but it won't solve very effectively.
Further, since the KdV equations are about wave propagation and are essentially hyperbolic, FiPy really isn't well suited (some diffusive element is generally needed for the algorithms underlying FiPy to converge). You might take a look at Clawpack or hp-FEM.
I am using fmin_l_bfgs_b for a bounded minimization on 4 parameters.
I would like to inspect the gradient at the minimum of the cost function and for this I call the d['grad'] parameter as described in the documentation of fmin_l_bfgs_b. My problem is that d['grad'] is an array of size 4 looking like:
'grad': array([ 8.38440428e-05, -5.72697445e-04, 3.21875859e-03,
-2.21115926e+00])
I would expect it to be a single value close to zero. Does this have something to do with the number of the parameters I am using for the minimization (4)..? Not what I would expect but any help would be appreciated.
What you are getting is the gradient of the cost function with respect to each parameter, in turn.
To picture it, suppose there were only two parameters, x and y. The cost function is a surface z as a function of x and y.
The optimization is finding a minimum point on that surface.
That's where the gradients with respect to both x and y are zero (or close to it).
If either gradient is not zero, you are not at a minimum, and you would descend further.
As a further point, you could well be interested in the curvature, or second derivative, because high curvature means a narrow (precise) minimum, while low curvature means a nearly flat minimum, with very uncertain estimates.
The second derivative in the x,y case would not be a 2-vector, but a 2x2-matrix (called a "Hessian", just to snow your friends).
You might want to think about why it's a 2x2-matrix.
I am trying to fit a polynomial to my data, e.g.
import scipy as sp
x = [1,6,9,17,23,28]
y = [6.1, 7.52324, 5.71, 5.86105, 6.3, 5.2]
and say I know the degree of polynomial (e.g.: 3), then I just use scipy.polyfit method to get the polynomial of a given degree:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
fittedModelFunction = sp.polyfit(x, y, 3)
func = sp.poly1d(fittedModelFunction)
++++++++++++++++++++++++++++++
QUESTIONS: ++++++++++++++++++++++++++++++
1) How can I tell in addition that the resulting function func must be always positive (i.e. f(x) >= 0 for any x)?
2) How can I further define a constraint (e.g. number of (local) min and max points, etc.) in order to get a better fitting?
Is there smth like this:
http://mail.scipy.org/pipermail/scipy-user/2007-July/013138.html
but more accurate?
Always Positve
I haven't been able to find a scipy reference that determines if a function is positive-definite, but an indirect way would be to find the all the roots - Scipy Roots - of the function and inspect the limits near those roots. There are a few cases to consider:
No roots at all
Pick any x and evaluate the function. Since the function does not cross the x-axis because of a lack of roots, any positive result will indicate the function is positive!
Finite number of roots
This is probably the most likely case. You would have to inspect the limits before and after each root - Scipy Limits. You would have to specify your own minimum acceptable delta for the limit however. I haven't seen a 2-sided limit method provided by Scipy, but it looks simple enough to make your own.
from sympy import limit
// f: function, v: variable to limit, p: point, d: delta
// returns two limit values
def twoSidedLimit(f, v, p, d):
return limit(f, v, p-d), limit(f, v, p+d)
Infinite roots
I don't think that polyfit would generate an oscillating function, but this is something to consider. I don't know how to handle this with the method I have already offered... Um, hope it does not happen?
Constraints
The only built-in form of constraints seems to be limited to the optimize library of SciPy. A crude way to enforce constraints for polyfit would be to get the function from polyfit, generate a vector of values for various x, and try to select values from the vector that violate the constraint. If you try to use filter, map, or lambda it may be slow with large vectors since python's filter makes a copy of the list/vector being filtered. I can't really help in this regard.
I'm using scipy's fmin_l_bfgs_b optimization method on a 2-dimensional function available as a black box. Gradients cannot be evaluated directly, so I'm asking the method to approximate the gradients by setting approx_grad = True.
I want to know how the approximate gradients are computed. My guess is that at each point, for each dimension, gradient is approximated by forward difference. So for each point in N dimensions, N evaluations are made to get the partial derivatives. Is this correct?
Jacobian approximation is done with scipy.optimize.approx_fprime function, docs:
f(xk[i] + epsilon[i]) - f(xk[i])
f'[i] = ---------------------------------
epsilon[i]
Where epsilon is a paramether to fmin_l_bfgs_b
epsilon : float
Step size used when approx_grad is True, for numerically calculating the gradient
I do not know how scipy does it particularly. A popular approach is to calculate them as follows:
(f(x+e)-f(x-e)/(2*e) (apparently here no LaTex supported)
This gives you accuracy up to quadratic terms (just calculate Taylor expansion for each term and substract them)
I am trying to integrate a function over a list of point and pass the whole array to an integration function in order ot vectorize the thing. For starters, calling scipy.integrate.quad is way too slow since I have something like 10 000 000 points to integrate. Using scipy.integrate.romberg does the trick much faster, almost instantaneous while quad is slow since you must loop over it or vectorize it.
My function is quite complicated, but for demonstation purpose, let's say I want to integrate x^2 from a to b, but x is an array of scalar to evaluate x. For example
import numpy as np
from scipy.integrate import quad, romberg
def integrand(x, y):
return x**2 + y**2
quad(integrand, 0, 10, args=(10) # this fails since y is not a scalar
romberg(integrand, 0, 10) # y works here, giving the integral over
# the entire range
But this only work for fixed bounds. Is there a way to do something like
z = np.arange(20,30)
romberg(integrand, 0, z) # Fails since the function doesn't seem to
# support variable bounds
Only way I see it is to re-implement the algorithm itself in numpy and use that instead so I can have variable bounds. Any function that supports something like this? There is also romb, where you must supply the values of integrand directly and a dx interval, but that will be too imprecise for my complicated function (the marcum Q function, couldn't find any implementation, that could be another way to dot it).
The best approach when trying to evaluate a special function is to write a function that uses the properties of the function to quickly and accurately evaluate it in all parameter regimes. It is quite unlikely that a single approach will give accurate (or even stable) results for all ranges of parameters. Direct evaluation of an integral, as in this case, will almost certainly break down in many cases.
That being said, the general problem of evaluating an integral over many ranges can be solved by turning the integral into a differential equation and solving that. Roughly, the steps would be
Given an integral I(t) which I will assume is an integral of a function f(x) from 0 to t [this can be generalized to an arbitrary lower limit], write it as the differential equation dI/dt = f(x).
Solve this differential equation using scipy.integrate.odeint() for some initial conditions (here I(0)) over some range of times from 0 to t. This range should contain all limits of interest. How finely this is sampled depends on the function and how accurately it needs to be evaluated.
The result will be the value of the integral from 0 to t for the set of t we input. We can turn this into a "continuous" function using interpolation. For example, using a spline we can define i = scipy.interpolate.InterpolatedUnivariateSpline(t,I).
Given a set of upper and lower limits in arrays b and a, respectively, then we can evaluate them all at once as res=i(b)-i(a).
Whether this approach will work in your case will require you to carefully study it over your range of parameters. Also note that the Marcum Q function involves a semi-infinite integral. In principle this is not a problem, just transform the integral to one over a finite range. For example, consider the transformation x->1/x. There is no guarantee this approach will be numerically stable for your problem.