Why is numpy and scipy exp() faster than log()? - python

In general, log and exp functions should be roughly the same speed. I would expect the numpy and scipy implementations to be relative straightforward wrappers. numpy.log() and scipy.log() have similar speed as expected. However, I found that numpy.log() is ~60% slower than these exp() functions and scipy.log() is 100% slower. Does anyone one know the reason for this?

Not sure why you think that both should be "roughly the same speed". It's true that both can be calculated using a Taylor series (which, even by itself means little without analyzing the error term), but then the numerical tricks kick in.
E.g., an algebraic identity can be used to transform the original exp. Taylor series into a more efficient 2-jump power series. However, for the power series, see here a discussion of by-case optimizations, some of which involve a lookup table.
Which arguments did you give the functions - the same? the worst one for each?
What was the accuracy of the results? And how do you measure the accuracy for each: absolutely, relatively?
Edit It should be noted that these libraries can also have different backends.

Related

Solving a large (150 variable) system of linear, ordinary differential equations; running into floating point rounding and/or stiffness problems

EDIT: Original post too vague. I am looking for an algorithm to solve a large-system, solvable, linear IVP that can handle very small floating point values. Solving for the eigenvectors and eigenvalues is impossible with numpy.linalg.eig() as the returned values are complex and should not be, it does not support numpy.float128 either, and the matrix is not symmetric so numpy.linalg.eigh() won't work. Sympy could do it given an infinite amount of time, but after running it for 5 hours I gave up. scipy.integrate.solve_ivp() works with implicit methods (have tried Radau and BDF), but the output is wildly wrong. Are there any libraries, methods, algorithms, or solutions for working with this many, very small numbers?
Feel free to ignore the rest of this.
I have a 150x150 sparse (~500 nonzero entries of 22500) matrix representing a system of first order, linear differential equations. I'm attempting to find the eigenvalues and eigenvectors of this matrix to construct a function that serves as the analytical solution to the system so that I can just give it a time and it will give me values for each variable. I've used this method in the past for similar 40x40 matrices, and it's much (tens, in some cases hundreds of times) faster than scipy.integrate.solve_ivp() and also makes post model analysis much easier as I can find maximum values and maximum rates of change using scipy.optimize.fmin() or evaluate my function at inf to see where things settle if left long enough.
This time around, however, numpy.linalg.eig() doesn't seem to like my matrix and is giving me complex values, which I know are wrong because I'm modeling a physical system that can't have complex rates of growth or decay (or sinusoidal solutions), much less complex values for its variables. I believe this to be a stiffness or floating point rounding problem where the underlying LAPACK algorithm is unable to handle either the very small values (smallest is ~3e-14, and most nonzero values are of similar scale) or disparity between some values (largest is ~4000, but values greater than 1 only show up a handful of times).
I have seen suggestions for similar users' problems to use sympy to solve for the eigenvalues, but when it hadn't solved my matrix after 5 hours I figured it wasn't a viable solution for my large system. I've also seen suggestions to use numpy.real_if_close() to remove the imaginary portions of the complex values, but I'm not sure this is a good solution either; several eigenvalues from numpy.linalg.eig() are 0, which is a sign of error to me, but additionally almost all the real portions are of the same scale as the imaginary portions (exceedingly small), which makes me question their validity as well. My matrix is real, but unfortunately not symmetric, so numpy.linalg.eigh() is not viable either.
I'm at a point where I may just run scipy.integrate.solve_ivp() for an arbitrarily long time (a few thousand hours) which will probably take a long time to compute, and then use scipy.optimize.curve_fit() to approximate the analytical solutions I want, since I have a good idea of their forms. This isn't ideal as it makes my program much slower, and I'm also not even sure it will work with the stiffness and rounding problems I've encountered with numpy.linalg.eig(); I suspect Radau or BDF would be able to navigate the stiffness, but not the rounding.
Anybody have any ideas? Any other algorithms for finding eigenvalues that could handle this? Can numpy.linalg.eig() work with numpy.float128 instead of numpy.float64 or would even that extra precision not help?
I'm happy to provide additional details upon request. I'm open to changing languages if needed.
As mentioned in the comment chain above the best solution for this is to use a Matrix Exponential, which is a lot simpler (and apparently less error prone) than diagonalizing your system with eigenvectors and eigenvalues.
For my case I used scipy.sparse.linalg.expm() since my system is sparse. It's fast, accurate, and simple. My only complaint is the loss of evaluation at infinity, but it's easy enough to work around.

lmfit/scipy.optimize minimization methods description?

Is there any place with a brief description of each of the algorithms for the parameter method in the minimize function of the lmfit package? Both there and in the documentation of SciPy there is no explanation about the details of each algorithm. Right now I know I can choose between them but I don't know which one to choose...
My current problem
I am using lmfit in Python to minimize a function. I want to minimize the function within a finite and predefined range where the function has the following characteristics:
It is almost zero everywhere, which makes it to be numerically identical to zero almost everywhere.
It has a very, very sharp peak in some point.
The peak can be anywhere within the region.
This makes many minimization algorithms to not work. Right now I am using a combination of the brute force method (method="brute") to find a point close to the peak and then feed this value to the Nelder-Mead algorithm (method="nelder") to finally perform the minimization. It is working approximately 50 % of the times, and the other 50 % of the times it fails to find the minimum. I wonder if there are better algorithms for cases like this one...
I think it is a fair point that docs for lmfit (such as https://lmfit.github.io/lmfit-py/fitting.html#fit-methods-table) and scipy.optimize (such as https://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html#optimization-scipy-optimize) do not give detailed mathematical descriptions of the algorithms.
Then again, most of the docs for scipy, numpy, and related libraries describe how to use the methods, but do not describe in much mathematical detail how the algorithms work.
In fairness, the different optimization algorithms share many features and the differences between them can get pretty technical. All of these methods try to minimize some metric (often called "cost" or "residual") by changing the values of parameters for the supplied function.
It sort of takes a text book (or at least a Wikipedia page) to establish the concepts and mathematical terms used for these methods, and then a paper (or at least a Wikipedia page) to describe how each method differs from the others. So, I think the basic answer would be to look up the different methods.

Scipy Linear algebra LinearOperator function utilised in Conjugate Gradient

I am preconditioning a matrix using spilu, however, to pass this preconditioner into cg (the built in conjugate gradient method) it is necessary to use the LinearOperator function, can someone explain to me the parameter matvec, and why I need to use it. Below is my current code
Ainv=scla.spilu(A,drop_tol= 1e-7)
Ainv=scla.LinearOperator(Ainv.shape,matvec=Ainv)
scla.cg(A,b,maxiter=maxIterations, M = Ainv)
However this doesnt work and I am given the error TypeError: 'SuperLU' object is not callable. I have played around and tried
Ainv=scla.LinearOperator(Ainv.shape,matvec=Ainv.solve)
instead. This seems to work but I want to know why matvec needs Ainv.solve rather than just Ainv, and is it the right thing to feed LinearOperator?
Thanks for your time
Without having much experience with this part of scipy, some comments:
According to the docs you don't have to use LinearOperator, but you might do
M : {sparse matrix, dense matrix, LinearOperator}, so you can use explicit matrices too!
The idea/advantage of the LinearOperator:
Many iterative methods (e.g. cg, gmres) do not need to know the individual entries of a matrix to solve a linear system A*x=b. Such solvers only require the computation of matrix vector products docs
Depending on the task, sometimes even matrix-free approaches are available which can be much more efficient
The working approach you presented is indeed the correct one (some other source doing it similarily, and some course-materials doing it like that)
The idea of not using the inverse matrix, but using solve() here is not to form the inverse explicitly (which might be very costly)
A similar idea is very common in BFGS-based optimization algorithms although wiki might not give much insight here
scipy has an extra LinearOperator for this not forming the inverse explicitly! (although i think it's only used for statistics / completing/finishing some optimization; but i successfully build some LBFGS-based optimizers with this one)
Source # scicomp.stackexchange discussing this without touching scipy
And because of that i would assume spilu is completely going for this too (returning an object with a solve-method)

Comparing fsolve results in python and matlab

I have a follow up question to the post written a couple days ago, thank you for the previous feedback:
Finding complex roots from set of non-linear equations in python
I have gotten the set non-linear equations set up in python now so that fsolve will handle the real and imaginary parts independently. However, there are still problems with the python "fsolve" converging to the correct solution. I have exactly the same inputs that are used in Matlab, and after double checking, the set of equations are exactly the same as well. Matlab, no matter how I set the initial values, will always converge to the correct solution. With python however, every initial condition produces a different result, and never the correct one. After a fraction of a second, the following warning appears with python:
/opt/local/Library/Frameworks/Python.framework/Versions/Current/lib/python2.7/site-packages/scipy/optimize/minpack.py:227:
RuntimeWarning: The iteration is not making good progress, as measured by the
improvement from the last ten iterations.
warnings.warn(msg, RuntimeWarning)
I was wondering if there are some known differences between the fsolve in python and Matlab, and if there are some known methods to optimize the performance in python.
Thank you very much
I don't think that you should rely on the fact that the names are the same. I see from your other question that you are specifying that Matlab's fsolve use the 'levenberg-marquardt' algorithm rather than the default. Python's scipy.optimize.fsolve uses MINPACK's hybrd algorithms. Levenberg-Marquardt finds roots approximately by minimizing the sum of squares of the function and is quite robust. It is not a true root-finding method like the default 'trust-region-dogleg' algorithm. I don't know how the hybrd schemes work, but they claim to be a modification of Powell's method.
If you want something similar to what you're doing in Matlab, I'd look for an optimization scheme that implements Levenberg-Marquardt, such as scipy.optimize.root, which you were also using in your previous question. Is there a reason why you're not using that?

Comparing Root-finding (of a function) algorithms in Python

I would like to compare different methods of finding roots of functions in python (like Newton's methods or other simple calc based methods). I don't think I will have too much trouble writing the algorithms
What would be a good way to make the actual comparison? I read up a little bit about Big-O. Would this be the way to go?
The answer from #sarnold is right -- it doesn't make sense to do a Big-Oh analysis.
The principal differences between root finding algorithms are:
rate of convergence (number of iterations)
computational effort per iteration
what is required as input (i.e. do you need to know the first derivative, do you need to set lo/hi limits for bisection, etc.)
what functions it works well on (i.e. works fine on polynomials but fails on functions with poles)
what assumptions does it make about the function (i.e. a continuous first derivative or being analytic, etc)
how simple the method is to implement
I think you will find that each of the methods has some good qualities, some bad qualities, and a set of situations where it is the most appropriate choice.
Big O notation is ideal for expressing the asymptotic behavior of algorithms as the inputs to the algorithms "increase". This is probably not a great measure for root finding algorithms.
Instead, I would think the number of iterations required to bring the actual error below some epsilon ε would be a better measure. Another measure would be the number of iterations required to bring the difference between successive iterations below some epsilon ε. (The difference between successive iterations is probably a better choice if you don't have exact root values at hand for your inputs. You would use a criteria such as successive differences to know when to terminate your root finders in practice, so you could or should use them here, too.)
While you can characterize the number of iterations required for different algorithms by the ratios between them (one algorithm may take roughly ten times more iterations to reach the same precision as another), there often isn't "growth" in the iterations as inputs change.
Of course, if your algorithms take more iterations with "larger" inputs, then Big O notation makes sense.
Big-O notation is designed to describe how an alogorithm behaves in the limit, as n goes to infinity. This is a much easier thing to work with in a theoretical study than in a practical experiment. I would pick things to study that you can easily measure that and that people care about, such as accuracy and computer resources (time/memory) consumed.
When you write and run a computer program to compare two algorithms, you are performing a scientific experiment, just like somebody who measures the speed of light, or somebody who compares the death rates of smokers and non-smokers, and many of the same factors apply.
Try and choose an example problem or problems to solve that is representative, or at least interesting to you, because your results may not generalise to sitations you have not actually tested. You may be able to increase the range of situations to which your results reply if you sample at random from a large set of possible problems and find that all your random samples behave in much the same way, or at least follow much the same trend. You can have unexpected results even when the theoretical studies show that there should be a nice n log n trend, because theoretical studies rarely account for suddenly running out of cache, or out of memory, or usually even for things like integer overflow.
Be alert for sources of error, and try to minimise them, or have them apply to the same extent to all the things you are comparing. Of course you want to use exactly the same input data for all of the algorithms you are testing. Make multiple runs of each algorithm, and check to see how variable things are - perhaps a few runs are slower because the computer was doing something else at a time. Be aware that caching may make later runs of an algorithm faster, especially if you run them immediately after each other. Which time you want depends on what you decide you are measuring. If you have a lot of I/O to do remember that modern operating systems and computer cache huge amounts of disk I/O in memory. I once ended up powering the computer off and on again after every run, as the only way I could find to be sure that the device I/O cache was flushed.
You can get wildly different answers for the same problem just by changing starting points. Pick an initial guess that's close to the root and Newton's method will give you a result that converges quadratically. Choose another in a different part of the problem space and the root finder will diverge wildly.
What does this say about the algorithm? Good or bad?
I would suggest you to have a look at the following Python root finding demo.
It is a simple code, with some different methods and comparisons between them (in terms of the rate of convergence).
http://www.math-cs.gordon.edu/courses/mat342/python/findroot.py
I just finish a project where comparing bisection, Newton, and secant root finding methods. Since this is a practical case, I don't think you need to use Big-O notation. Big-O notation is more suitable for asymptotic view. What you can do is compare them in term of:
Speed - for example here newton is the fastest if good condition are gathered
Number of iterations - for example here bisection take the most iteration
Accuracy - How often it converge to the right root if there is more than one root, or maybe it doesn't even converge at all.
Input - What information does it need to get started. for example newton need an X0 near the root in order to converge, it also need the first derivative which is not always easy to find.
Other - rounding errors
For the sake of visualization you can store the value of each iteration in arrays and plot them. Use a function you already know the roots.
Although this is a very old post, my 2 cents :)
Once you've decided which algorithmic method to use to compare them (your "evaluation protocol", so to say), then you might be interested in ways to run your challengers on actual datasets.
This tutorial explains how to do it, based on an example (comparing polynomial fitting algorithms on several datasets).
(I'm the author, feel free to provide feedback on the github page!)

Categories