How to force larger steps on scipy.optimize functions? - python

I have a function compare_images(k, a, b) that compares two 2d-arrays a and b
Inside the funcion, I apply a gaussian_filter with sigma=k to a My idea is to estimate how much I must to smooth image a in order for it to be similar to image b
The problem is my function compare_images will only return different values if k variation is over 0.5, and if I do fmin(compare_images, init_guess, (a, b) it usually get stuck to the init_guess value.
I believe the problem is fmin (and minimize) tends to start with very small steps, which in my case will reproduce the exact same return value for compare_images, and so the method thinks it already found a minimum. It will only try a couple times.
Is there a way to force fmin or any other minimizing function from scipy to take larger steps? Or is there any method better suited for my need?
EDIT:
I found a temporary solution.
First, as recommended, I used xtol=0.5 and higher as an argument to fmin.
Even then, I still had some problems, and a few times fmin would return init_guess.
I then created a simple loop so that if fmin == init_guess, I would generate another, random init_guess and try it again.
It's pretty slow, of course, but now I got it to run. It will take 20h or so to run it for all my data, but I won't need to do it again.
Anyway, to better explain the problem for those still interested in finding a better solution:
I have 2 images, A and B, containing some scientific data.
A looks like a few dots with variable value (it's a matrix of in which each valued point represents where a event occurred and it's intensity)
B looks like a smoothed heatmap (it is the observed density of occurrences)
B looks just like if you applied a gaussian filter to A with a bit of semi-random noise.
We are approximating B by applying a gaussian filter with constant sigma to A. This sigma was chosen visually, but only works for a certain class of images.
I'm trying to obtain an optimal sigma for each image, so later I could find some relations of sigma and the class of event showed in each image.
Anyway, thanks for the help!

Quick check: you probably really meant fmin(compare_images, init_guess, (a,b))?
If gaussian_filter behaves as you say, your function is piecewise constant, meaning that optimizers relying on derivatives (i.e. most of them) are out. You can try a global optimizer like anneal, or brute-force search over a sensible range of k's.
However, as you described the problem, in general there will only be a clear, global minimum of compare_images if b is a smoothed version of a. Your approach makes sense if you want to determine the amount of smoothing of a that makes both images most similar.
If the question is "how similar are the images", then I think pixelwise comparison (maybe with a bit of smoothing) is the way to go. Depending on what images we are talking about, it might be necessary to align the images first (e.g. for comparing photographs). Please clarify :-)
edit: Another idea that might help: rewrite compare_images so that it calculates two versions of smoothed-a -- one with sigma=floor(k) and one with ceil(k) (i.e. round k to the next-lower/higher int). Then calculate a_smooth = a_floor*(1-kfrac)+a_ceil*kfrac, with kfrac being the fractional part of k. This way the compare function becomes continuous w.r.t k.
Good Luck!

Basin hopping may do a bit better, as it has a high chance of continuing anyway when it gets stuck at the plateau's.
I found on this example function that it does reasonably well with a low temperature:
>>> opt.basinhopping(lambda (x,y): int(0.1*x**2 + 0.1*y**2), (5,-5), T=.1)
nfev: 409
fun: 0
x: array([ 1.73267813, -2.54527514])
message: ['requested number of basinhopping iterations completed successfully']
njev: 102
nit: 100

I realize this is an old question but I haven't been able to find many discussion of similar topics. I am facing a similar issue with scipy.optimize.least_squares. I found that xtol did not do me much good. It did not seem to change the step size at all. What made a big difference was diff_step. This sets the step size taken when numerically estimating the Jacobian according to the formula step_size = x_i*diff_step, where x_i is each independent variable. You are using fmin so you aren't calculating Jacobians, but if you used another scipy function like minimize for the same problem, this might help you.

I had the same problem and got it to work with the 'TNC' method.
res = minimize(f, [1] * 2, method = 'TNC', bounds=[(0,15)] * 2, jac = '2-point', options={'disp': True, 'finite_diff_rel_step': 0.1, 'xtol': 0.1, 'accuracy': 0.1, 'eps': 0.1})
The combination between 'finite_diff_rel_step' and setting 'jac' to one of {‘2-point’, ‘3-point’, ‘cs’} did the trick for the jacobian calculation step, and the 'accuracy' did the trick for the step size. The 'xtol' and 'eps' I don't think are needed, I just added them just in case.
In the example, I have 2 variables that are initialized to 1 and with boundaries [0,15] because I'm approximating the parameters of beta distribution, but it should apply to your case also.

Related

Intelligent fitting with python

Is there a more intelligent function than scipy.optimize.curve_fit in Python?
I also need to define a function to fit data with.
I've spend ages trying to fit data with it. I can fit only basic functions and fitting two lines with piecewise function is impossible while the y-axis has low values like 0.01-0.05 and x-axis values like 20-60.
I know I have to plug in initial values, but still it takes too much time and sometimes it does not work.
EDIT
I added graph where are data I fitted and you can see the effect of changing bounds in scipy.optimize.curve_fit.
The function I fit with is this one:
def abslines(x,a,b,c,d):
return np.piecewise(x, [x < -b/a, x >= -b/a], [lambda x: a*x+b+d, lambda x: c*(x+b/a)+d])
Initial conditions are same everytime and I think they are close enough:
p0=[-0.001,0.2,0.005,0.]
because the values of parameters from best fit are:
[-0.00411946 0.19895546 0.00817832 0.00758401]
Bounds are:
No bounds;
bounds=([-1.,0.,0.,0.],[0.,1.,1.,1.])
bounds=([-0.5,0.01,0.0001,0.],[-0.001,0.5,0.5,1.])
bounds=([-0.1,0.01,0.0001,0.],[-0.001,0.5,0.1,1.])
bounds=([-0.01,0.1,0.0001,0.],[-0.001,0.5,0.1,1.])
starting with no bounds, end with best bounds
Still I think, that this takes too much time and curve_fit can find it better. This way I have to almost specify the function and it seems like I am fitting by changing parameters not that curve_fit is fitting.
Without knowing what is exactly the regression algorithm in Python it is quite impossible to give a definitive answer. Probably the calculus is iterative and requires initial guesses, which are probably derived from the specified bounds. So, the bounds have an indirect effect on the convergence and the results.
I suggest to try a simpler algorithm (not iterative, no initial guess) coming from this paper : https://fr.scribd.com/document/380941024/Regression-par-morceaux-Piecewise-Regression-pdf
The code is easy to write in any computer language. I suppose this can be done with Python as well.
The piecewise function to be fitted is :
The parameters to be computed are a1, p1, q1, p2 and q2.
The result is shown on the next figure, with the approximate values of the parameters.
So that, no bounds are required to be specified and as a consequence no problems related to bounds.
NOTE : The method is based on the fitting of a convenient integral equation such as shown in the above referenced paper. The numerical calculus of the integral is subjected to deviations if the number of points is too small. In the present case, they are a large number of points. So, even scattered this is a favourable case for the practical application of this method.
1.Algorithms behind curve_fit expect differentiable functions, thus it can go south if given a non-differential one.
For a more powerful interface to curve fitting, have a look at lmfit.

Scipy.optimize.minimize only iterates some variables.

I have written python (2.7.3) code wherein I aim to create a weighted sum of 16 data sets, and compare the result to some expected value. My problem is to find the weighting coefficients which will produce the best fit to the model. To do this, I have been experimenting with scipy's optimize.minimize routines, but have had mixed results.
Each of my individual data sets is stored as a 15x15 ndarray, so their weighted sum is also a 15x15 array. I define my own 'model' of what the sum should look like (also a 15x15 array), and quantify the goodness of fit between my result and the model using a basic least squares calculation.
R=np.sum(np.abs(model/np.max(model)-myresult)**2)
'myresult' is produced as a function of some set of parameters 'wts'. I want to find the set of parameters 'wts' which will minimise R.
To do so, I have been trying this:
res = minimize(get_best_weightings,wts,bounds=bnds,method='SLSQP',options={'disp':True,'eps':100})
Where my objective function is:
def get_best_weightings(wts):
wts_tr=wts[0:16]
wts_ti=wts[16:32]
for i,j in enumerate(portlist):
originalwtsr[j]=wts_tr[i]
originalwtsi[j]=wts_ti[i]
realwts=originalwtsr
imagwts=originalwtsi
myresult=make_weighted_beam(realwts,imagwts,1)
R=np.sum((np.abs(modelbeam/np.max(modelbeam)-myresult))**2)
return R
The input (wts) is an ndarray of shape (32,), and the output, R, is just some scalar, which should get smaller as my fit gets better. By my understanding, this is exactly the sort of problem ("Minimization of scalar function of one or more variables.") which scipy.optimize.minimize is designed to optimize (http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.optimize.minimize.html ).
However, when I run the code, although the optimization routine seems to iterate over different values of all the elements of wts, only a few of them seem to 'stick'. Ie, all but four of the values are returned with the same values as my initial guess. To illustrate, I plot the values of my initial guess for wts (in blue), and the optimized values in red. You can see that for most elements, the two lines overlap.
Image:
http://imgur.com/p1hQuz7
Changing just these few parameters is not enough to get a good answer, and I can't understand why the other parameters aren't also being optimised. I suspect that maybe I'm not understanding the nature of my minimization problem, so I'm hoping someone here can point out where I'm going wrong.
I have experimented with a variety of minimize's inbuilt methods (I am by no means committed to SLSQP, or certain that it's the most appropriate choice), and with a variety of 'step sizes' eps. The bounds I am using for my parameters are all (-4000,4000). I only have scipy version .11, so I haven't tested a basinhopping routine to get the global minimum (this needs .12). I have looked at minimize.brute, but haven't tried implementing it yet - thought I'd check if anyone can steer me in a better direction first.
Any advice appreciated! Sorry for the wall of text and the possibly (probably?) idiotic question. I can post more of my code if necessary, but it's pretty long and unpolished.

Maximum Likelihood Estimate pseudocode

I need to code a Maximum Likelihood Estimator to estimate the mean and variance of some toy data. I have a vector with 100 samples, created with numpy.random.randn(100). The data should have zero mean and unit variance Gaussian distribution.
I checked Wikipedia and some extra sources, but I am a little bit confused since I don't have a statistics background.
Is there any pseudo code for a maximum likelihood estimator? I get the intuition of MLE but I cannot figure out where to start coding.
Wiki says taking argmax of log-likelihood. What I understand is: I need to calculate log-likelihood by using different parameters and then I'll take the parameters which gave the maximum probability. What I don't get is: where will I find the parameters in the first place? If I randomly try different mean & variance to get a high probability, when should I stop trying?
I just came across this, and I know its old, but I'm hoping that someone else benefits from this. Although the previous comments gave pretty good descriptions of what ML optimization is, no one gave pseudo-code to implement it. Python has a minimizer in Scipy that will do this. Here's pseudo code for a linear regression.
# import the packages
import numpy as np
from scipy.optimize import minimize
import scipy.stats as stats
import time
# Set up your x values
x = np.linspace(0, 100, num=100)
# Set up your observed y values with a known slope (2.4), intercept (5), and sd (4)
yObs = 5 + 2.4*x + np.random.normal(0, 4, 100)
# Define the likelihood function where params is a list of initial parameter estimates
def regressLL(params):
# Resave the initial parameter guesses
b0 = params[0]
b1 = params[1]
sd = params[2]
# Calculate the predicted values from the initial parameter guesses
yPred = b0 + b1*x
# Calculate the negative log-likelihood as the negative sum of the log of a normal
# PDF where the observed values are normally distributed around the mean (yPred)
# with a standard deviation of sd
logLik = -np.sum( stats.norm.logpdf(yObs, loc=yPred, scale=sd) )
# Tell the function to return the NLL (this is what will be minimized)
return(logLik)
# Make a list of initial parameter guesses (b0, b1, sd)
initParams = [1, 1, 1]
# Run the minimizer
results = minimize(regressLL, initParams, method='nelder-mead')
# Print the results. They should be really close to your actual values
print results.x
This works great for me. Granted, this is just the basics. It doesn't profile or give CIs on the parameter estimates, but its a start. You can also use ML techniques to find estimates for, say, ODEs and other models, as I describe here.
I know this question was old, hopefully you've figured it out since then, but hopefully someone else will benefit.
If you do maximum likelihood calculations, the first step you need to take is the following: Assume a distribution that depends on some parameters. Since you generate your data (you even know your parameters), you "tell" your program to assume Gaussian distribution. However, you don't tell your program your parameters (0 and 1), but you leave them unknown a priori and compute them afterwards.
Now, you have your sample vector (let's call it x, its elements are x[0] to x[100]) and you have to process it. To do so, you have to compute the following (f denotes the probability density function of the Gaussian distribution):
f(x[0]) * ... * f(x[100])
As you can see in my given link, f employs two parameters (the greek letters µ and σ). You now have to calculate the values for µ and σ in a way such that f(x[0]) * ... * f(x[100]) takes the maximum possible value.
When you've done that, µ is your maximum likelihood value for the mean, and σ is the maximum likelihood value for standard deviation.
Note that I don't explicitly tell you how to compute the values for µ and σ, since this is a quite mathematical procedure I don't have at hand (and probably I would not understand it); I just tell you the technique to get the values, which can be applied to any other distributions as well.
Since you want to maximize the original term, you can "simply" maximize the logarithm of the original term - this saves you from dealing with all these products, and transforms the original term into a sum with some summands.
If you really want to calculate it, you can do some simplifications that lead to the following term (hope I didn't mess up anything):
Now, you have to find values for µ and σ such that the above beast is maximal. Doing that is a very nontrivial task called nonlinear optimization.
One simplification you could try is the following: Fix one parameter and try to calculate the other. This saves you from dealing with two variables at the same time.
You need a numerical optimisation procedure. Not sure if anything is implemented in Python, but if it is then it'll be in numpy or scipy and friends.
Look for things like 'the Nelder-Mead algorithm', or 'BFGS'. If all else fails, use Rpy and call the R function 'optim()'.
These functions work by searching the function space and trying to work out where the maximum is. Imagine trying to find the top of a hill in fog. You might just try always heading up the steepest way. Or you could send some friends off with radios and GPS units and do a bit of surveying. Either method could lead you to a false summit, so you often need to do this a few times, starting from different points. Otherwise you may think the south summit is the highest when there's a massive north summit overshadowing it.
As joran said, the maximum likelihood estimates for the normal distribution can be calculated analytically. The answers are found by finding the partial derivatives of the log-likelihood function with respect to the parameters, setting each to zero, and then solving both equations simultaneously.
In the case of the normal distribution you would derive the log-likelihood with respect to the mean (mu) and then deriving with respect to the variance (sigma^2) to get two equations both equal to zero. After solving the equations for mu and sigma^2, you'll get the sample mean and sample variance as your answers.
See the wikipedia page for more details.

Python: Plotting a power law function with exponential cutoff

I have a graph between 2 functions f and g.
I know it follows a power law function with exponential cutoff.
f(x) = x**(-alpha)*e**(-lambda*x)
How do I find the value of exponent alpha?
If you have sufficiently close x points (for example one every 0.1), you can try the following:
ln(f(x)) = -alpha ln(x) - lambda x
ln(f(x))' = - alpha / x - lambda
So depending on where you have your points:
If you have a lot of points near 0, you can try:
h(x) = x ln(f(x))' = -alpha - lambda x
So the limit of the function h when x goes to 0 is -alpha
If you have large values of x, the function x -> ln(f(x))' tends toward lambda when x goes to infinity, so you can guess lambda and use pwdyson's expression.
If you don't have close x points, the numerical derivative will be very noisy, so I would try to guess lambda as the limit of -ln(f(x)/x for large x's...
If you don't have large values, but a large number of x's, you can try a minimization of
sum_x_i (ln(y_i) + alpha ln(x_i) + lambda x_i) ^2
on both alpha and lambda (I guess It would be more precise than the initial expression)...
It is a simple least square regression (numpy.linalg.lstsq will do the job).
So you have plenty of methods, the one to chose really depends on you inputs.
The usual and general way of doing what you want is to perform a non-linear regression (even though, as noted in another response, it is possible to linearize the problem). Python can do this quite easily with the help of the SciPy package, which is used by many scientists.
The routine you are looking for is its least-square optimization routine (scipy.optimize.leastsq). Once you wrap your head around the way this general optimization procedure works (see the example), you will probably find many other opportunities to use it. Basically, you calculate the list of differences between your measurements and their ideal value f(x), and you ask SciPy to find the parameters that make these differences as small as possible, so that your data fits the model as well as possible. This then gives you the parameter you are looking for.
It sounds like you might be trying to fit a power-law to a distribution with an exponential cutoff at the low end due to incompleteness - but I may be reading too far into your problem.
If that is the problem you're dealing with, this website (and accompanying publication) addresses the issue: http://tuvalu.santafe.edu/~aaronc/powerlaws/. I wrote the python implementation of the power-law fitter on that page; it is linked from there.
If you know that the points follow this law exactly, then invert the equation and put in an x and its corresponding f(x) value:
import math
alpha = -(lambda*x + math.log(f(x)))/math.log(x)
But the if the points do not exactly fit the equation you will need to do some sort of regression to determine alpha.
EDIT: Ok, so they don't fit exactly. This is getting beyond a Python question, but there may be something in numpy that can handle it. Here is a numpy linear regression recipe but your equation can't be rearranged into a linear form, so you'll have to look into non-linear regression.

Can someone explain why scipy.integrate.quad gives different results for equally long ranges while integrating sin(X)?

I am trying to numerically integrate an arbitrary (known when I code) function in my program
using numerical integration methods. I am using Python 2.5.2 along with SciPy's numerical integration package. In order to get a feel for it, i decided to try integrating sin(x) and observed this behavior-
>>> from math import pi
>>> from scipy.integrate import quad
>>> from math import sin
>>> def integrand(x):
... return sin(x)
...
>>> quad(integrand, -pi, pi)
(0.0, 4.3998892617846002e-14)
>>> quad(integrand, 0, 2*pi)
(2.2579473462709165e-16, 4.3998892617846002e-14)
I find this behavior odd because -
1. In ordinary integration, integrating over the full cycle gives zero.
2. In numerical integration, this (1) isn't necessarily the case, because you may just be
approximating the total area under the curve.
In any case, either assuming 1 is True or assuming 2 is True, I find the behavior to be inconsistent. Either both integrations (-pi to pi and 0 to 2*pi) should return 0.0 (first value in the tuple is the result and the second is the error) or return 2.257...
Can someone please explain why this is happening? Is this really an inconsistency? Can someone also tell me if I am missing something really basic about numerical methods?
In any case, in my final application, I plan to use the above method to find the arc length of a function. If someone has experience in this area, please advise me on the best policy for doing this in Python.
Edit
Note
I already have the first differential values at all points in the range stored in an array.
Current error is tolerable.
End note
I have read Wikipaedia on this. As Dimitry has pointed out, I will be integrating sqrt(1+diff(f(x), x)^2) to get the Arc Length. What I wanted to ask was - is there a better approximation/ Best practice(?) / faster way to do this. If more context is needed, I'll post it separately/ post context here, as you wish.
The quad function is a function from an old Fortran library. It works by judging by the flatness and slope of the function it is integrating how to treat the step size it uses for numerical integration in order to maximize efficiency. What this means is that you may get slightly different answers from one region to the next even if they're analytically the same.
Without a doubt both integrations should return zero. Returning something that is 1/(10 trillion) is pretty close to zero! The slight differences are due to the way quad is rolling over sin and changing its step sizes. For your planned task, quad will be all you need.
EDIT:
For what you're doing I think quad is fine. It is fast and pretty accurate. My final statement is use it with confidence unless you find something that really has gone quite awry. If it doesn't return a nonsensical answer then it is probably working just fine. No worries.
I think it is probably machine precision since both answers are effectively zero.
If you want an answer from the horse's mouth I would post this question on the scipy discussion board
I would say that a number O(10^-14) is effectively zero. What's your tolerance?
It might be that the algorithm underlying quad isn't the best. You might try another method for integration and see if that improves things. A 5th order Runge-Kutta can be a very nice general purpose technique.
It could be just the nature of floating point numbers: "What Every Computer Scientist Should Know About Floating Point Arithmetic".
This output seems correct to me since you have absolute error estimate here. The integral value of sin(x) is indeed should have value of zero for full period (any interval of 2*pi length) in both ordinary and numeric integration and your results is close to that value.
To evaluate arc length you should calculate integral for sqrt(1+diff(f(x), x)^2) function, where diff(f(x), x) is derivative of f(x). See also Arc length
0.0 == 2.3e-16 (absolute error tolerance 4.4e-14)
Both answers are the same and correct i.e., zero within the given tolerance.
The difference comes from the fact that sin(x)=-sin(-x) exactly even in finite precision. Whereas finite precision only gives sin(x)~sin(x+2*pi) approximately. Sure it would be nice if quad were smart enough to figure this out, but it really has no way of knowing apriori that the integral over the two intervals you give are equivalent or that the the first is a better result.

Categories