I have written python (2.7.3) code wherein I aim to create a weighted sum of 16 data sets, and compare the result to some expected value. My problem is to find the weighting coefficients which will produce the best fit to the model. To do this, I have been experimenting with scipy's optimize.minimize routines, but have had mixed results.
Each of my individual data sets is stored as a 15x15 ndarray, so their weighted sum is also a 15x15 array. I define my own 'model' of what the sum should look like (also a 15x15 array), and quantify the goodness of fit between my result and the model using a basic least squares calculation.
R=np.sum(np.abs(model/np.max(model)-myresult)**2)
'myresult' is produced as a function of some set of parameters 'wts'. I want to find the set of parameters 'wts' which will minimise R.
To do so, I have been trying this:
res = minimize(get_best_weightings,wts,bounds=bnds,method='SLSQP',options={'disp':True,'eps':100})
Where my objective function is:
def get_best_weightings(wts):
wts_tr=wts[0:16]
wts_ti=wts[16:32]
for i,j in enumerate(portlist):
originalwtsr[j]=wts_tr[i]
originalwtsi[j]=wts_ti[i]
realwts=originalwtsr
imagwts=originalwtsi
myresult=make_weighted_beam(realwts,imagwts,1)
R=np.sum((np.abs(modelbeam/np.max(modelbeam)-myresult))**2)
return R
The input (wts) is an ndarray of shape (32,), and the output, R, is just some scalar, which should get smaller as my fit gets better. By my understanding, this is exactly the sort of problem ("Minimization of scalar function of one or more variables.") which scipy.optimize.minimize is designed to optimize (http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.optimize.minimize.html ).
However, when I run the code, although the optimization routine seems to iterate over different values of all the elements of wts, only a few of them seem to 'stick'. Ie, all but four of the values are returned with the same values as my initial guess. To illustrate, I plot the values of my initial guess for wts (in blue), and the optimized values in red. You can see that for most elements, the two lines overlap.
Image:
http://imgur.com/p1hQuz7
Changing just these few parameters is not enough to get a good answer, and I can't understand why the other parameters aren't also being optimised. I suspect that maybe I'm not understanding the nature of my minimization problem, so I'm hoping someone here can point out where I'm going wrong.
I have experimented with a variety of minimize's inbuilt methods (I am by no means committed to SLSQP, or certain that it's the most appropriate choice), and with a variety of 'step sizes' eps. The bounds I am using for my parameters are all (-4000,4000). I only have scipy version .11, so I haven't tested a basinhopping routine to get the global minimum (this needs .12). I have looked at minimize.brute, but haven't tried implementing it yet - thought I'd check if anyone can steer me in a better direction first.
Any advice appreciated! Sorry for the wall of text and the possibly (probably?) idiotic question. I can post more of my code if necessary, but it's pretty long and unpolished.
Related
I have a python function that takes a bunch (1 or 2) of arguments and returns a 2D array. I have been trying to use scipy curve_fit and least_squares to optimize the input arguments so that the resultant 2D array matches another 2D array that has be pre-made. I ran into the problem of both the methods returning me the initial guess as the converged solution. After ripping apart much hair from my head, I have figured out that the issue was that since the small increment that it makes to the initial guess is too small to make any difference in the 2D array that my function returns (as the cell values in the array are quantized and are not continuous) and hence scipy assumes that it has reached convergence (or local minimum) at the initial guess.
I was wondering if there is a way around this (such as forcing it to use a bigger increment while guessing).
Thanks.
I have ran into a very similar problem recently and it turns out that these kind of optimizers work only for continous-differentiable functions. That's why they would return the initial parameters, as the function you want to fit cannot be differentiated. In my case, I could manually make my fit function differentiable by first fitting a polynomial function to it before plugging it into the curve_fit optimizer.
Is there a more intelligent function than scipy.optimize.curve_fit in Python?
I also need to define a function to fit data with.
I've spend ages trying to fit data with it. I can fit only basic functions and fitting two lines with piecewise function is impossible while the y-axis has low values like 0.01-0.05 and x-axis values like 20-60.
I know I have to plug in initial values, but still it takes too much time and sometimes it does not work.
EDIT
I added graph where are data I fitted and you can see the effect of changing bounds in scipy.optimize.curve_fit.
The function I fit with is this one:
def abslines(x,a,b,c,d):
return np.piecewise(x, [x < -b/a, x >= -b/a], [lambda x: a*x+b+d, lambda x: c*(x+b/a)+d])
Initial conditions are same everytime and I think they are close enough:
p0=[-0.001,0.2,0.005,0.]
because the values of parameters from best fit are:
[-0.00411946 0.19895546 0.00817832 0.00758401]
Bounds are:
No bounds;
bounds=([-1.,0.,0.,0.],[0.,1.,1.,1.])
bounds=([-0.5,0.01,0.0001,0.],[-0.001,0.5,0.5,1.])
bounds=([-0.1,0.01,0.0001,0.],[-0.001,0.5,0.1,1.])
bounds=([-0.01,0.1,0.0001,0.],[-0.001,0.5,0.1,1.])
starting with no bounds, end with best bounds
Still I think, that this takes too much time and curve_fit can find it better. This way I have to almost specify the function and it seems like I am fitting by changing parameters not that curve_fit is fitting.
Without knowing what is exactly the regression algorithm in Python it is quite impossible to give a definitive answer. Probably the calculus is iterative and requires initial guesses, which are probably derived from the specified bounds. So, the bounds have an indirect effect on the convergence and the results.
I suggest to try a simpler algorithm (not iterative, no initial guess) coming from this paper : https://fr.scribd.com/document/380941024/Regression-par-morceaux-Piecewise-Regression-pdf
The code is easy to write in any computer language. I suppose this can be done with Python as well.
The piecewise function to be fitted is :
The parameters to be computed are a1, p1, q1, p2 and q2.
The result is shown on the next figure, with the approximate values of the parameters.
So that, no bounds are required to be specified and as a consequence no problems related to bounds.
NOTE : The method is based on the fitting of a convenient integral equation such as shown in the above referenced paper. The numerical calculus of the integral is subjected to deviations if the number of points is too small. In the present case, they are a large number of points. So, even scattered this is a favourable case for the practical application of this method.
1.Algorithms behind curve_fit expect differentiable functions, thus it can go south if given a non-differential one.
For a more powerful interface to curve fitting, have a look at lmfit.
I'm trying to use KernelPCA for reducing the dimensionality of a dataset to 2D (both for visualization purposes and for further data analysis).
I experimented computing KernelPCA using a RBF kernel at various values of Gamma, but the result is unstable:
(each frame is a slightly different value of Gamma, where Gamma is varying continuously from 0 to 1)
Looks like it is not deterministic.
Is there a way to stabilize it/make it deterministic?
Code used to generate transformed data:
def pca(X, gamma1):
kpca = KernelPCA(kernel="rbf", fit_inverse_transform=True, gamma=gamma1)
X_kpca = kpca.fit_transform(X)
#X_back = kpca.inverse_transform(X_kpca)
return X_kpca
KernelPCA should be deterministic and evolve continuously with gamma. It is different from RBFSampler that does have built-in randomness in order to provide an efficient (more scalable) approximation of the RBF kernel.
However what can change in KernelPCA is the order of the principal components: in scikit-learn they are returned sorted in order of descending eigenvalue, so if you have 2 eigenvalues close to each other it could be that the order changes with gamma.
My guess (from the gif) is that this is what is happening here: the axes along which you are plotting are not constant so your data seems to jump around.
Could you provide the code you used to produce the gif?
I'm guessing it is a plot of the data points along the 2 first principal components but it would help to see how you produced it.
You could try to further inspect it by looking at the values of kpca.alphas_ (the eigenvectors) for each value of gamma.
Hope this makes sense.
EDIT: As you noted it looks like the points are reflected against the axis, the most plausible explanation is that one of the eigenvector flips sign (note this does not affect the eigenvalue).
I put in a simple gist to reproduce the issue (you'll need a Jupyter notebook to run it). You can see the sign-flipping when you change the value of gamma.
As a complement note that this kind of discrepancy happens only because you fit several times the KernelPCA object several times. Once you settled with a particular gamma value and you've fit kpca once you can call transform several times and get consistent results.
For the classical PCA the docs mention that:
Due to implementation subtleties of the Singular Value Decomposition (SVD), which is used in this implementation, running fit twice on the same matrix can lead to principal components with signs flipped (change in direction). For this reason, it is important to always use the same estimator object to transform data in a consistent fashion.
I don't know about the behavior of a single KernelPCA object that you would fit several times (I did not find anything relevant in the docs).
It does not apply to your case though as you have to fit the object with several gamma values.
So... I can't give you a definitive answer on why KernelPCA is not deterministic. The behavior resembles the differences I've observed between the results of PCA and RandomizedPCA. PCA is deterministic, but RandomizedPCA is not, and sometimes the eigenvectors are flipped in sign relative to the PCA eigenvectors.
That leads me to my vague idea of how you might get more deterministic results....maybe. Use RBFSampler with a fixed seed:
def pca(X, gamma1):
kernvals = RBFSampler(gamma=gamma1, random_state=0).fit_transform(X)
kpca = PCA().fit_transform(X)
X_kpca = kpca.fit_transform(X)
return X_kpca
I have a function compare_images(k, a, b) that compares two 2d-arrays a and b
Inside the funcion, I apply a gaussian_filter with sigma=k to a My idea is to estimate how much I must to smooth image a in order for it to be similar to image b
The problem is my function compare_images will only return different values if k variation is over 0.5, and if I do fmin(compare_images, init_guess, (a, b) it usually get stuck to the init_guess value.
I believe the problem is fmin (and minimize) tends to start with very small steps, which in my case will reproduce the exact same return value for compare_images, and so the method thinks it already found a minimum. It will only try a couple times.
Is there a way to force fmin or any other minimizing function from scipy to take larger steps? Or is there any method better suited for my need?
EDIT:
I found a temporary solution.
First, as recommended, I used xtol=0.5 and higher as an argument to fmin.
Even then, I still had some problems, and a few times fmin would return init_guess.
I then created a simple loop so that if fmin == init_guess, I would generate another, random init_guess and try it again.
It's pretty slow, of course, but now I got it to run. It will take 20h or so to run it for all my data, but I won't need to do it again.
Anyway, to better explain the problem for those still interested in finding a better solution:
I have 2 images, A and B, containing some scientific data.
A looks like a few dots with variable value (it's a matrix of in which each valued point represents where a event occurred and it's intensity)
B looks like a smoothed heatmap (it is the observed density of occurrences)
B looks just like if you applied a gaussian filter to A with a bit of semi-random noise.
We are approximating B by applying a gaussian filter with constant sigma to A. This sigma was chosen visually, but only works for a certain class of images.
I'm trying to obtain an optimal sigma for each image, so later I could find some relations of sigma and the class of event showed in each image.
Anyway, thanks for the help!
Quick check: you probably really meant fmin(compare_images, init_guess, (a,b))?
If gaussian_filter behaves as you say, your function is piecewise constant, meaning that optimizers relying on derivatives (i.e. most of them) are out. You can try a global optimizer like anneal, or brute-force search over a sensible range of k's.
However, as you described the problem, in general there will only be a clear, global minimum of compare_images if b is a smoothed version of a. Your approach makes sense if you want to determine the amount of smoothing of a that makes both images most similar.
If the question is "how similar are the images", then I think pixelwise comparison (maybe with a bit of smoothing) is the way to go. Depending on what images we are talking about, it might be necessary to align the images first (e.g. for comparing photographs). Please clarify :-)
edit: Another idea that might help: rewrite compare_images so that it calculates two versions of smoothed-a -- one with sigma=floor(k) and one with ceil(k) (i.e. round k to the next-lower/higher int). Then calculate a_smooth = a_floor*(1-kfrac)+a_ceil*kfrac, with kfrac being the fractional part of k. This way the compare function becomes continuous w.r.t k.
Good Luck!
Basin hopping may do a bit better, as it has a high chance of continuing anyway when it gets stuck at the plateau's.
I found on this example function that it does reasonably well with a low temperature:
>>> opt.basinhopping(lambda (x,y): int(0.1*x**2 + 0.1*y**2), (5,-5), T=.1)
nfev: 409
fun: 0
x: array([ 1.73267813, -2.54527514])
message: ['requested number of basinhopping iterations completed successfully']
njev: 102
nit: 100
I realize this is an old question but I haven't been able to find many discussion of similar topics. I am facing a similar issue with scipy.optimize.least_squares. I found that xtol did not do me much good. It did not seem to change the step size at all. What made a big difference was diff_step. This sets the step size taken when numerically estimating the Jacobian according to the formula step_size = x_i*diff_step, where x_i is each independent variable. You are using fmin so you aren't calculating Jacobians, but if you used another scipy function like minimize for the same problem, this might help you.
I had the same problem and got it to work with the 'TNC' method.
res = minimize(f, [1] * 2, method = 'TNC', bounds=[(0,15)] * 2, jac = '2-point', options={'disp': True, 'finite_diff_rel_step': 0.1, 'xtol': 0.1, 'accuracy': 0.1, 'eps': 0.1})
The combination between 'finite_diff_rel_step' and setting 'jac' to one of {‘2-point’, ‘3-point’, ‘cs’} did the trick for the jacobian calculation step, and the 'accuracy' did the trick for the step size. The 'xtol' and 'eps' I don't think are needed, I just added them just in case.
In the example, I have 2 variables that are initialized to 1 and with boundaries [0,15] because I'm approximating the parameters of beta distribution, but it should apply to your case also.
I have a list of many float numbers, representing the length of an operation made several times.
For each type of operation, I have a different trend in numbers.
I'm aware of many random generators presented in some python modules, like in numpy.random
For example, I have binomial, exponencial, normal, weibul, and so on...
I'd like to know if there's a way to find the best random generator, given a list of values, that best fit each list of numbers that I have.
I.e, the generator (with its params) that best fit the trend of the numbers on the list
That's because I'd like to automatize the generation of time lengths, of each operation, so that I can simulate it during n years, without having to find by hand what method fits best what list of numbers.
EDIT: In other words, trying to clarify the problem:
I have a list of numbers. I'm trying to find the probability distribution that best fit the array of numbers I already have. The only problem I see is that each probability distribution has input params that may interfer on the result. So I'll have to figure out how to enter this params automatically, trying to best fit the list.
Any idea?
You might find it better to think about this in terms of probability distributions, rather than thinking about random number generators. You can then think in terms of testing goodness of fit for your different distributions.
As a starting point, you might try constructing probability plots for your samples. Probably the easiest in terms of the math behind it would be to consider a Q-Q plot. Using the random number generators, create a sample of the same size as your data. Sort both of these, and plot them against one another. If the distributions are the same, then you should get a straight line.
Edit: To find appropriate parameters for a statistical model, maximum likelihood estimation is a standard approach. Depending on how many samples of numbers you have and the precision you require, you may well find that just playing with the parameters by hand will give you a "good enough" solution.
Why using random numbers for this is a bad idea has already been explained. It seems to me that what you really need is to fit the distributions you mentioned to your points (for example, with a least squares fit), then check which one fits the points best (for example, with a chi-squared test).
EDIT Adding reference to numpy least squares fitting example
Given a parameterized univariate distirbution (e.g. exponential depends on lambda, or gamma depends on theta and k), the way to find the parameter values that best fit a given sample of numbers is called the Maximum Likelyhood procedure. It is not a least squares procedure, which would require binning and thus loosing information! Some Wikipedia distribution articles give expressions for the maximum likelyhood estimates of parameters, but many do not, and even the ones that do are missing expressions for error bars and covarainces. If you know calculus, you can derive these results by expressing the log likeyhood of your data set in terms of the parameters, setting the second derivative to zero to maximize it, and using the inverse of the curvature matrix at the minimum as the covariance matrix of your parameters.
Given two different fits to two different parameterized distributions, the way to compare them is called the likelyhood ratio test. Basically, you just pick the one with the larger log likelyhood.
Gabriel, if you have access to Mathematica, parameter estimation is built in:
In[43]:= data = RandomReal[ExponentialDistribution[1], 10]
Out[43]= {1.55598, 0.375999, 0.0878202, 1.58705, 0.874423, 2.17905, \
0.247473, 0.599993, 0.404341, 0.31505}
In[44]:= EstimatedDistribution[data, ExponentialDistribution[la],
ParameterEstimator -> "MaximumLikelihood"]
Out[44]= ExponentialDistribution[1.21548]
In[45]:= EstimatedDistribution[data, ExponentialDistribution[la],
ParameterEstimator -> "MethodOfMoments"]
Out[45]= ExponentialDistribution[1.21548]
However, it might be easy to figure what maximum likelihood method commands the parameter to be.
In[48]:= Simplify[
D[LogLikelihood[ExponentialDistribution[la], {x}], la], x > 0]
Out[48]= 1/la - x
Hence the estimated parameter for exponential distribution is sum (1/la -x_i) from where la = 1/Mean[data]. Similar equations can be worked out for other distribution families and coded in the language of your choice.