In lapack there are much more functions than in "lapack interface" in scipy. Is there any reason behind this and is there OS-independentway to call lapack functions directly?
I realize that I may call dinamic library directly, but this means writing my own wrapper and this is not what I want.
To make a real usecase, I need to call dsbgv to solve generalized eigenproblem for banded matrix. It is orders of magnitude faster than using eig which is for general matrix.
scipy.linalg.lapack is an organically grown (tm) set of wrappers, added by different people with different goals, needs, motivations and time budgets over quite a few years.
cython_lapack is a complete set of wrappers for a certain (old enough) version of LAPACK. It is lower level however: you need to supply all lapack arguments, ensure the correct array ordering, alignment etc.
Related
I want to implement the adjoint sensitivity analysis in python, in order to determine the gradient of my objective function with respect to some parameters. In specific the objective function depends on the solution of a differential equation which in turn depends on said parameters which I am looking to find the optimum of.
To perform this there are numerous good packages both in Julia (see here), as well as CVODES from SUNDIALS, however the latter which does apparently have a wrapper made for python, does not include sensitivity analysis capabilities according to this link. Furthermore, I have looked into SALib for sensitivity analysis, but as far as I understand this refers to some other type of 'sensitivity analysis' and therefore adjoint or even forward sensitivity analysis is not included (correct me if I am wrong on this one).
Thus my question is, does a version of CVODES exist in python with sensitivity analysis capabilities, or is there there any other package where one can use in order to perform adjoint sensitivity analys?
You can easily call Julia code / packages from Python with pyjulia.
https://github.com/JuliaPy/pyjulia
You can try Assimulo, which is a Python wrapper of the SUNDIALS suite. I've been using it for some years now and it works pretty robustly. So far, I have performed forward sensitivity analysis on ODE systems with moderate number of states/parameters using CVODEs (less than 20 states, less than 10 parameters). It works pretty well in terms of robustness (can handle stiff problems, and also supports a variety of linear solvers for sparse problems) and speed, and also supports DAEs through IDAs.
I have installed Assimulo using conda, which deals with all the dependency tree (including SUNDIALS in its more recent version). Finally, I'm not aware whether adjoint sensitivity analysis can be performed using Assimulo. If you find something, let us all know.
I am trying to use the scipy.linalg package but without the overhead. The funtions I currently use are scipy.linalg.cho_factor and scipy.linalg.cho_solve for cholesky decomposition. I have to use scipy.linalg.lapack or scipy.linalg.cython_lapack (or even blas), especially for the latter function which is being called thousands of times.
The problem is I don't know where to start, since there are tons of functions and their names are encoded e.g. cbbcsd, cbdsqr etc. The question is: how do you find the right function?
One option is to look inside the Python/Cython code for scipy.linalg.cho_factor and see what it uses. For cho_factor it looks like it might be ?potrf. The question mark is replaced with a letter depending on the data types.
More generally, if you know which operation you want to apply, the names are constructed pretty systematically. See http://www.netlib.org/lapack/lug_old/node26.html and http://www.netlib.org/lapack/lug_old/node26.html for example. Typically you have to know the data-type (single precision (i.e. float), double, complex, etc...), the symmetry (if any) of the matrices and then the operation.
BLAS is largely just multiplications of various combinations of matrices and vectors and some solutions of systems of linear equations, while Lapack is anything more advanced (decompositions, eigenvalues, etc).
With the python package scipy one can find the principle value of a function (given that the pole is of low order) using the "cauchy" weighting method, see scipy.integrate.quad (consider for instance this question, where its usage is demonstrated). Is something analogous possible within the julia ecosystem (of course on can import scipy easily, but the native integration packages of julia should be, in principle, superior).
There doesn't seem to be any native library that does this. GSL has it (https://www.gnu.org/software/gsl/doc/html/integration.html#qawc-adaptive-integration-for-cauchy-principal-values), so you can call it through https://github.com/JuliaMath/GSL.jl
I am working on multi-objective optimization in Matlab, and am using the fiminimax in the Optimization toolbox. I want to know if fminimax applies Pareto optimization, and if not, why? Also, can you suggest a multi-objective optimization package in Matlab or Python that does use Pareto?
For python, DEAP may be the one you're looking for. Extensive documentation with a lot of real life examples, and a really helpful Google Groups forum. It implements two robust MO algorithms: NSGA-II and SPEA-II.
Edit (as requested)
I am using DEAP for my MSc thesis, so I will let you know how we are using Pareto optimality. Setting DEAP up is pretty straight-forward, as you will see in the examples. Use this one as a starting point. This is the short version, which uses the built-in algorithms and operators. Read both and then follow these guidelines.
As the OneMax example is single-objective, it doesn't use MO algorithms. However, it's easy to implement them:
Change your evaluation function so it returns a n-tuple with the desired scores. If you want to minimize standard deviation too, something like return sum(individual), numpy.std(individual) would work.
Also, modify the weights parameter of the base.Fitness object so it matches that returned n-tuple. A positive float means maximization, while a negative one means minimization. You can use any real number, but I would stick with 1.0 and -1.0 for the sake of simplicity.
Change your genetic operators to cxSimulatedBinaryBounded(), mutPolynomialBounded() and selNSGA2(), for crossover, mutation and selection operations, respectively. These are the suggested methods, as they were developed by the NSGA-II authors.
If you want to use one of the embedded ready-to-go algorithms in DEAP, choose MuPlusLambda().
When calling the algorithm, remember to change the halloffame parameter from HallOfFame() to ParetoFront(). This will return all non-dominated individuals, instead of the best lexicographically sorted "best individuals in all generations". Then you can resolve your Pareto Front as desired: weighted sum, custom lexicographic sorting, etc.
I hope that helps. Take into account that there's also a full, somehow more advanced, NSGA2 example available here.
For fminimax and fgoalattain it looks like the answer is no. However, the genetic algorithm solver, gamultiobj, is Pareto set-based, though I'm not sure if it's the kind of multi-objective optimization function you want to use. gamultiobj implements the NGSA-II evolutionary algorithm. There's also this package that implements the Strengthen Pareto Evolutionary Algorithm 2 (SPEA-II) in C with a Matlab mex interface. It's a bit old so you might want to recompile it (you'll need to anyways if you're not on Windows 32-bit).
long-time R and Python user here. I use R for my daily data analysis and Python for tasks heavier on text processing and shell-scripting. I am working with increasingly large data sets, and these files are often in binary or text files when I get them. The type of things I do normally is to apply statistical/machine learning algorithms and create statistical graphics in most cases. I use R with SQLite sometimes and write C for iteration-intensive tasks; before looking into Hadoop, I am considering investing some time in NumPy/Scipy because I've heard it has better memory management [and the transition to Numpy/Scipy for one with my background seems not that big] - I wonder if anyone has experience using the two and could comment on the improvements in this area, and if there are idioms in Numpy that deal with this issue. (I'm also aware of Rpy2 but wondering if Numpy/Scipy can handle most of my needs). Thanks -
R's strength when looking for an environment to do machine learning and statistics is most certainly the diversity of its libraries. To my knowledge, SciPy + SciKits cannot be a replacement for CRAN.
Regarding memory usage, R is using a pass-by-value paradigm while Python is using pass-by-reference. Pass-by-value can lead to more "intuitive" code, pass-by-reference can help optimize memory usage. Numpy also allows to have "views" on arrays (kind of subarrays without a copy being made).
Regarding speed, pure Python is faster than pure R for accessing individual elements in an array, but this advantage disappears when dealing with numpy arrays (benchmark). Fortunately, Cython lets one get serious speed improvements easily.
If working with Big Data, I find the support for storage-based arrays better with Python (HDF5).
I am not sure you should ditch one for the other but rpy2 can help you explore your options about a possible transition (arrays can be shuttled between R and Numpy without a copy being made).
I use NumPy daily and R nearly so.
For heavy number crunching, i prefer NumPy to R by a large margin (including R packages, like 'Matrix') I find the syntax cleaner, the function set larger, and computation is quicker (although i don't find R slow by any means). NumPy's Broadcasting functionality for instance, i do not think has an analog in R.
For instance, to read in a data set from a csv file and 'normalize' it for input to an ML algorithm (e.g., mean center then re-scale each dimension) requires just this:
data = NP.loadtxt(data1, delimiter=",") # 'data' is a NumPy array
data -= NP.mean(data, axis=0)
data /= NP.max(data, axis=0)
Also, i find that when coding ML algorithms, i need data structures that i can operate on element-wise and that also understand linear algebra (e.g., matrix multiplication, transpose, etc.). NumPy gets this and allows you to create these hybrid structures easily (no operator overloading or subclassing, etc.).
You won't be disappointed by NumPy/SciPy, more likely you'll be amazed.
So, a few recommendations--in general and in particular, given the facts in your question:
install both NumPy and Scipy. As a rough guide, NumPy provides the
core data structures (in particular
the ndarray) and SciPy (which is
actually several times larger than
NumPy) provides the domain-specific
functions (e.g., statistics, signal
processing, integration).
install the repository versions, particularly w/r/t NumPy because the
dev version is 2.0. Matplotlib and NumPy are tightly integrated, you can use one without the other of course, but both are the best in their respective class among python libraries. You can get all three via easy_install, which i assume you already.
NumPy/SciPy have several modules
specifically directed to Machine
Learning/Statistics, including the Clustering package and the Statistics package.
As well as packages directed to
general computation, but which are
make coding ML algorithms a lot
faster, in particular,
Optimization and Linear Algebra.
There are also the SciKits, not included in the base NumPy or
SciPy libraries; you need to install them separately.
Generally speaking, each SciKit is a
set of convenience wrappers to
streamline coding in a given domain. The SciKits you are likely to find most relevant are: ann (approximate Nearest Neighbor), and learn (a set of ML/Statistics regression and classification algorithms, e.g., Logistic Regression, Multi-Layer Perceptron, Support Vector Machine).