The use of scipy.optimize.curve_fit() has been important in my current (astrophysics-related) research project. Now that I'm working on a publication I want to make reference to scipy.optimize.curve_fit() in my paper. The current draft of my paper refers to curve_fit() as follows
...are fit using the curve_fit() function in the optimize module
of SciPy.
I want to make sure that my use of the words "function" and "module" are correct. I am still learning the structure of modules, methods, and functions in Python and wanted to make sure that I am referring to them correctly.
Bonus: The SciPy website's citation guidelines state:
For any specific algorithm, also consider citing the original author’s paper (this can often be found under the “References” section of the docstring).
As far as I can tell, curve_fit() has no references specified in its docstring, and neither does leastsq() which it relies on heavily. I am planning on just citing the general SciPy library (as specified in the citation guidelines on the website) rather than the specific guideline. Is there a more specific reference anyone can point me to?
The notes in the SciPy reference to curve_fit() indicate it uses the Levenberg-Marquardt algorithm through leastsq(). The notes under leastsq say it is a wrapper around MINPACK's lmdif and lmder algorithms. Under scipy.optimize.root() it mentions the Levenberg-Marquardt implementation in MINPACK, and points to: More, Jorge J., Burton S. Garbow, and Kenneth E. Hillstrom. 1980. User Guide for MINPACK-1. (R102 in the SciPy 0.13.0 Reference Guide Bibliography).
Related
I'm doing my final degree project. I need to create an extended version of the word2vec algorithm, changing the default objective function of the original paper. This has already been done (check this paper). In that paper, they only say the new objective function, but they do not say how they have run the model.
Now, I need to extend that model too, with another function, but I'm not sure if I have to implement word2vec myself with the new function, or there is a way to replace it in the Gensim word2vec implementation.
I have checked the Word2Vec Gensim documentation but I have not seen any parameter to do this. Do you have any idea how to do it? It is even possible?
I was unsure if this StackExchange site was the correct one, maybe https://ai.stackexchange.com/ is more appropriate.
There's no official support in Gensim for simply dropping in your own objective function.
However, the full source code is available – https://github.com/RaRe-Technologies/gensim – so by editing it, or using it as a model for your own implementation, you could theoretically do anything.
Beware, though:
the code has gone through a lot of optimization & customization for new options that may not be relevant to your needs, so may not be the most clean & simple starting point
for performance, the core routines are written in Cython – see the .pyx files – which can be especially hard to debug, and rely on library bulk array functions that may obscure how to implement your alternate function instead
Is there any place with a brief description of each of the algorithms for the parameter method in the minimize function of the lmfit package? Both there and in the documentation of SciPy there is no explanation about the details of each algorithm. Right now I know I can choose between them but I don't know which one to choose...
My current problem
I am using lmfit in Python to minimize a function. I want to minimize the function within a finite and predefined range where the function has the following characteristics:
It is almost zero everywhere, which makes it to be numerically identical to zero almost everywhere.
It has a very, very sharp peak in some point.
The peak can be anywhere within the region.
This makes many minimization algorithms to not work. Right now I am using a combination of the brute force method (method="brute") to find a point close to the peak and then feed this value to the Nelder-Mead algorithm (method="nelder") to finally perform the minimization. It is working approximately 50 % of the times, and the other 50 % of the times it fails to find the minimum. I wonder if there are better algorithms for cases like this one...
I think it is a fair point that docs for lmfit (such as https://lmfit.github.io/lmfit-py/fitting.html#fit-methods-table) and scipy.optimize (such as https://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html#optimization-scipy-optimize) do not give detailed mathematical descriptions of the algorithms.
Then again, most of the docs for scipy, numpy, and related libraries describe how to use the methods, but do not describe in much mathematical detail how the algorithms work.
In fairness, the different optimization algorithms share many features and the differences between them can get pretty technical. All of these methods try to minimize some metric (often called "cost" or "residual") by changing the values of parameters for the supplied function.
It sort of takes a text book (or at least a Wikipedia page) to establish the concepts and mathematical terms used for these methods, and then a paper (or at least a Wikipedia page) to describe how each method differs from the others. So, I think the basic answer would be to look up the different methods.
Does anyone know what method scipy.linalg.solve_banded uses to solve the system of equations? The documentation does not state the solution method used by the function. Usually the Thomas algorithm, a.k.a. TDMA, is used for these types of systems but I was wondering if this Scipy function uses some other solution method.
The Github code shows that scipy uses the lapack routine gbsv() to solve this. You can read about gbsv() here and here.
I am not sure if this the same as the Thomas algorithm. Looks like both use LU decomposition, though.
I am working on multi-objective optimization in Matlab, and am using the fiminimax in the Optimization toolbox. I want to know if fminimax applies Pareto optimization, and if not, why? Also, can you suggest a multi-objective optimization package in Matlab or Python that does use Pareto?
For python, DEAP may be the one you're looking for. Extensive documentation with a lot of real life examples, and a really helpful Google Groups forum. It implements two robust MO algorithms: NSGA-II and SPEA-II.
Edit (as requested)
I am using DEAP for my MSc thesis, so I will let you know how we are using Pareto optimality. Setting DEAP up is pretty straight-forward, as you will see in the examples. Use this one as a starting point. This is the short version, which uses the built-in algorithms and operators. Read both and then follow these guidelines.
As the OneMax example is single-objective, it doesn't use MO algorithms. However, it's easy to implement them:
Change your evaluation function so it returns a n-tuple with the desired scores. If you want to minimize standard deviation too, something like return sum(individual), numpy.std(individual) would work.
Also, modify the weights parameter of the base.Fitness object so it matches that returned n-tuple. A positive float means maximization, while a negative one means minimization. You can use any real number, but I would stick with 1.0 and -1.0 for the sake of simplicity.
Change your genetic operators to cxSimulatedBinaryBounded(), mutPolynomialBounded() and selNSGA2(), for crossover, mutation and selection operations, respectively. These are the suggested methods, as they were developed by the NSGA-II authors.
If you want to use one of the embedded ready-to-go algorithms in DEAP, choose MuPlusLambda().
When calling the algorithm, remember to change the halloffame parameter from HallOfFame() to ParetoFront(). This will return all non-dominated individuals, instead of the best lexicographically sorted "best individuals in all generations". Then you can resolve your Pareto Front as desired: weighted sum, custom lexicographic sorting, etc.
I hope that helps. Take into account that there's also a full, somehow more advanced, NSGA2 example available here.
For fminimax and fgoalattain it looks like the answer is no. However, the genetic algorithm solver, gamultiobj, is Pareto set-based, though I'm not sure if it's the kind of multi-objective optimization function you want to use. gamultiobj implements the NGSA-II evolutionary algorithm. There's also this package that implements the Strengthen Pareto Evolutionary Algorithm 2 (SPEA-II) in C with a Matlab mex interface. It's a bit old so you might want to recompile it (you'll need to anyways if you're not on Windows 32-bit).
I am looking for a numpy-based implementation of ordinary least squares that would allow the fit to be updated with more observations. Something along the lines of Applied Statistics algorithm AS 274 or R's biglm.
Failing that, a routine for updating a QR decomposition with new rows would also be of interest.
Any pointers?
scikits.statsmodels has an recursive OLS that updates the inverse X'X in the sandbox that could be used for this. (used only to calculate recursive OLS residuals.)
Nathaniel Smith posted his code for OLS when the data is too large to fit in memory to the scipy-user mailing list. The main code updates X'X.
I think econpy also has a function for this.
Pandas has an expanding OLS, but it may not be easy to use in an online fashion.
Nathaniels code might be the closest to biglm. I don't think there is anything for general linear model (error covariance different from identity).
All need some work before they can be used for this. I don't know of any python(-wrapped) code that would update QR.
update:
see http://mail.scipy.org/pipermail/scipy-dev/2010-February/013853.html
there is incremental qr and cholesky in cholmod available, but I didn't try it, either license or compilation on windows problems, and I don't think I tried to get incremental_qr to work
see attachements
http://mail.scipy.org/pipermail/scipy-dev/2010-February/013844.html
You might try the pythonequations project at http://code.google.com/p/pythonequations/downloads/list, though it may be more than you need it does use scipy and numpy. That code is the middleware for the http://zunzun.com online curve and surface fitting web site (I'm the author). The source code comes with many examples. Alternatively, the web site alone may be sufficient - please give it a try.
James Phillips
2548 Vera Cruz Drive
Birmingham, AL 35235 USA
zunzun#zunzun.com
This is not a detailed answer yet, but:
AFAIK, the QR update like this is not implemented in numpy, but anyway I'll like ask you to specify a more detailed manner what you are actually aiming for.
Especially, why it would not be acceptable to just calculate new estimate for x (of Ax= b) with k latest observations, when (bunch of) new observations arrives (and with modern hardware, k indeed can be quite large one)?
The LSQ.F90 part of the file compiles easily enough with,
gfortran-4.4 -shared -fPIC -g -o lsq.so LSQ.F90
and this works in Python,
from ctypes import cdll
lsq = cdll.LoadLibrary('./lsq.so')
As soon as I figure out the function call I'll include it in this answer.