gradient at the minimum in fmin_l_bfgs_b - python

I am using fmin_l_bfgs_b for a bounded minimization on 4 parameters.
I would like to inspect the gradient at the minimum of the cost function and for this I call the d['grad'] parameter as described in the documentation of fmin_l_bfgs_b. My problem is that d['grad'] is an array of size 4 looking like:
'grad': array([ 8.38440428e-05, -5.72697445e-04, 3.21875859e-03,
-2.21115926e+00])
I would expect it to be a single value close to zero. Does this have something to do with the number of the parameters I am using for the minimization (4)..? Not what I would expect but any help would be appreciated.

What you are getting is the gradient of the cost function with respect to each parameter, in turn.
To picture it, suppose there were only two parameters, x and y. The cost function is a surface z as a function of x and y.
The optimization is finding a minimum point on that surface.
That's where the gradients with respect to both x and y are zero (or close to it).
If either gradient is not zero, you are not at a minimum, and you would descend further.
As a further point, you could well be interested in the curvature, or second derivative, because high curvature means a narrow (precise) minimum, while low curvature means a nearly flat minimum, with very uncertain estimates.
The second derivative in the x,y case would not be a 2-vector, but a 2x2-matrix (called a "Hessian", just to snow your friends).
You might want to think about why it's a 2x2-matrix.

Related

scipy.optimize.leastq Minimize sum of least squares

I have a least squares minimization problem that has the following form
Where the parameters I want to optimize over are x and everything else is known.
scipy.optimize.least_squares has the following form:
scipy.optimize.least_squares(fun, x0)
where x0 is an initial condition and fun is a "Function which computes the vector of residuals"
After reading the documentation, I'm a little confused about what fun wants me to return.
If I do the summation inside fun, then I'm afraid that it would compute RHS, which is not equivalent to the LHS (...or is it, when it comes to minimization?)
Thanks for any assistance!
According to the documentation of scipy.optimize.least_squares, the argument fun is to provide the vector of residuals with which the process of minimization proceeds. It is possible to supply a scalar that is the result of summation of squared residuals, but it is also possible to supply a one-dimensional vector of shape (m,), where m is the number of dimensions of the residual function. Note that squaring and summation is not done in this instance as least_squares handles that detail on its own. Only the residuals as such must be supplied in this instance.

How to use `scipy.optimize.leastsq` to optimize in the joint least squares direction?

I want to be able to move along a gradient in the joint least squares direction.
I thought I could do this using scipy.optimize.leastsq (http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.leastsq.html). (Perhaps I'm wrong, maybe there's an easier way to do this?).
I'm having difficulty understanding what to use, and how to move in the joint least squares direction, while still increasing the parameters.
What I need to do is input something like this:
[1,0]
And, have it move along the least squares direction, which would mean, increasing either or both values 1 and 0, but doing so such that the sum of the squared values is as small as possible.
This would mean [1,0] would increase to [1, <something barely greater than 0>], and eventually would reach [1,1]. At which point, both 1's would increase at the same rate.
How would I program this? It seems to me like scipy.optimize.leastsq would be of use here, but I cannot figure out how to use it?
Thankyou.
I don't think you need scipy.optimize.leastsq because your problem can be solved analytically. At any moment, the gradient of the fuction np.sum(x) where x is an array, is 2*x. So, if you want to obtain the smallest increase, then you have to increase the smallest component of the gradient, which you can find with np.argmin. Here is a simple solution:
def g(x):
return np.array(2*x)
x = np.array([1.,0.])
for _ in range(200):
eps = np.zeros_like(x)
index = np.argmin(g(x))
eps[index] = 0.01 #or whatever
x += eps
print(x)
When multiple indices have the same value, np.argmin returns the first occurrence, so you will encounter certain oscillations that you can minimize reducing eps

Estimation of fundamental matrix or essential matrix from feature matching

I am estimating the fundamental matrix and the essential matrix by using the inbuilt functions in opencv.I provide input points to the function by using ORB and brute force matcher.These are the problems that i am facing:
1.The essential matrix that i compute from in built function does not match with the one i find from mathematical computation using fundamental matrix as E=k.t()FK.
2.As i vary the number of points used to compute F and E,the values of F and E are constantly changing.The function uses Ransac method.How do i know which value is the correct one??
3.I am also using an inbuilt function to decompose E and find the correct R and T from the 4 possible solutions.The value of R and T also change with the changing E.More concerning is the fact that the direction vector T changes without a pattern.Say it was in X direction at a value of E,if i change the value of E ,it changes to Y or Z.Y is this happening????.Has anyone else had the same problem.???
How do i resolve this problem.My project involves taking measurements of objects from images.
Any suggestions or help would be welcome!!
Both F and E are defined up to a scale factor. It may help to normalize the matrices, e. g. by dividing by the last element.
RANSAC is a randomized algorithm, so you will get a different result every time. You can test how much it varies by triangulating the points, or by computing the reprojection errors. If the results vary too much, you may want to increase the number of RANSAC trials or decrease the distance threshold, to make sure that RANSAC converges to the correct solution.
Yes, Computing Fundamental Matrix gives a different matrix every time as it is defined up to a scale factor.
It is a Rank 2 matrix with 7DOF(3 rot, 3 trans, 1 scaling).
The fundamental matrix is a 3X3 matrix, F33(3rd col and 3rd row) is scale factor.
You make ask why do we append matrix with constant at F33, Because of (X-Left)F(x-Right)=0, This is a homogenous equation with infinite solutions, we are adding a constraint by making F33 constant.

Find global minimum using scipy.optimize.minimize

Given a 2D point p, I'm trying to calculate the smallest distance between that point and a functional curve, i.e., find the point on the curve which gives me the smallest distance to p, and then calculate that distance. The example function that I'm using is
f(x) = 2*sin(x)
My distance function for the distance between some point p and a provided function is
def dist(p, x, func):
x = np.append(x, func(x))
return sum([[i - j]**2 for i,j in zip(x,p)])
It takes as input, the point p, a position x on the function, and the function handle func. Note this is a squared Euclidean distance (since minimizing in Euclidean space is the same as minimizing in squared Euclidean space).
The crucial part of this is that I want to be able to provide bounds for my function so really I'm finding the closest distance to a function segment. For this example my bounds are
bounds = [0, 2*np.pi]
I'm using the scipy.optimize.minimize function to minimize my distance function, using the bounds. A result of the above process is shown in the graph below.
This is a contour plot showing distance from the sin function. Notice how there appears to be a discontinuity in the contours. For convenience, I've plotted a few points around that discontinuity and the "closet" points on the curve that they map to.
What's actually happening here is that the scipy function is finding a local minimum (given some initial guess), but not a global one and that is causing the discontinuity. I know finding the global minimum of any function is impossible, but I'm looking for a more reliable way to find the global minimum.
Possible methods for finding a global minimum would be
Choose a smart initial guess, but this amounts to knowing approximately where the global minimum is to begin with, which is using the solution of the problem to solve it.
Use a multiple initial guesses and choose the answer which gets to the best minimum. This however seems like a poor choice, especially when my functions get more complicated (and higher dimensional).
Find the minimum, then perturb the solution and find the minimum again, hoping that I may have knocked it into a better minimum. I'm hoping that maybe there is some way to do this simply without evoking some complicated MCMC algorithm or something like that. Speed counts for this process.
Any suggestions about the best way to go about this, or possibly directions to useful functions that may tackle this problem would be great!
As suggest in a comment, you could try a global optimization algorithm such as scipy.optimize.differential_evolution. However, in this case, where you have a well-defined and analytically tractable objective function, you could employ a semi-analytical approach, taking advantage of the first-order necessary conditions for a minimum.
In the following, the first function is the distance metric and the second function is (the numerator of) its derivative w.r.t. x, that should be zero if a minimum occurs at some 0<x<2*np.pi.
import numpy as np
def d(x, p):
return np.sum((p-np.array([x,2*np.sin(x)]))**2)
def diff_d(x, p):
return -2 * p[0] + 2 * x - 4 * p[1] * np.cos(x) + 4 * np.sin(2*x)
Now, given a point p, the only potential minimizers of d(x,p) are the roots of diff_d(x,p) (if any), as well as the boundary points x=0 and x=2*np.pi. It turns out that diff_d may have more than one root. Noting that the derivative is a continuous function, the pychebfun library offers a very efficient method for finding all the roots, avoiding cumbersome approaches based on the scipy root-finding algorithms.
The following function provides the minimum of d(x, p) for a given point p:
import pychebfun
def min_dist(p):
f_cheb = pychebfun.Chebfun.from_function(lambda x: diff_d(x, p), domain = (0,2*np.pi))
potential_minimizers = np.r_[0, f_cheb.roots(), 2*np.pi]
return np.min([d(x, p) for x in potential_minimizers])
Here is the result:

Derivatives blow up in python

I am trying to find higher order derivatives of a dataset (x,y). x and y are 1D arrays of length N.
Let's say I generate them as :
xder0=np.linspace(0,10,1000)
yder0=np.sin(xder0)
I define the derivative function which takes in 2 array (x,y) and returns (x1, y1) where y1 is the derivative calculated at each index as : (y[i+1]-y[i])/(x[i+1]-x[i]). x1 is just the mean of x[i+1] and x[i]
Here is the function that does it:
def deriv(x,y):
delx =np.zeros((len(x)-1), dtype=np.longdouble)
ydiff=np.zeros((len(x)-1), dtype=np.longdouble)
for i in range(len(x)-1):
delx[i] =(x[i+1]+x[i])/2.0
ydiff[i] =(y[i+1]-y[i])/(x[i+1]-x[i])
return delx, ydiff
Now to calculate the first derivative, I call this function as:
xder1, yder1 = deriv(xder0, yder0)
Similarly for second derivative, I call this function giving first derivatives as input:
xder2, yder2 = deriv(xder1, yder1)
And it goes on:
xder3, yder3 = deriv(xder2, yder2)
xder4, yder4 = deriv(xder3, yder3)
xder5, yder5 = deriv(xder4, yder4)
xder6, yder6 = deriv(xder5, yder5)
xder7, yder7 = deriv(xder6, yder6)
xder8, yder8 = deriv(xder7, yder7)
xder9, yder9 = deriv(xder8, yder8)
Something peculiar happens after I reach order 7. The 7th order becomes very noisy! Earlier derivatives are all either sine or cos functions as expected. However 7th order is a noisy sine. And hence all derivatives after that blow up.
Any idea what is going on?
This is a well known stability issue with numerical interpolation using equally-spaced points. Read the answers at http://math.stackexchange.com.
To overcome this problem you have to use non-equally-spaced points, like the roots of Lagendre polynomial. The instability occurs due to the unavailability of information at the boundaries, thus more concentration of points at the boundaries is required, as per the roots of say Lagendre polynomials or others with similar properties such as Chebyshev polynomial.

Categories