Using numba.jit to speed up right-hand-side calculations for odeint from scipy.integrate works fine:
from scipy.integrate import ode, odeint
from numba import jit
#jit
def rhs(t, X):
return 1
X = odeint(rhs, 0, np.linspace(0, 1, 11))
However using integrate.ode like this:
solver = ode(rhs)
solver.set_initial_value(0, 0)
while solver.successful() and solver.t < 1:
solver.integrate(solver.t + 0.1)
produces the following error with the decorator #jit:
capi_return is NULL
Call-back cb_f_in_dvode__user__routines failed.
Traceback (most recent call last):
File "sandbox/numba_cubic.py", line 15, in <module>
solver.integrate(solver.t + 0.1)
File "/home/pgermann/Software/anaconda3/lib/python3.4/site-packages/scipy/integrate/_ode.py", line 393, in integrate
self.f_params, self.jac_params)
File "/home/pgermann/Software/anaconda3/lib/python3.4/site-packages/scipy/integrate/_ode.py", line 848, in run
y1, t, istate = self.runner(*args)
TypeError: not enough arguments: expected 2, got 1
Any ideas how to overcome this?
You can use a wrapper function, but I think it will not improve your performance for small rhs functions.
#jit(nopython=True)
def rhs(t, X):
return 1
def wrapper(t, X):
return rhs(t, X)
solver = ode(wrapper)
solver.set_initial_value(0, 0)
while solver.successful() and solver.t < 1:
solver.integrate(solver.t + 0.1)
I do not know a reason or solution, however in this case Theano helped a lot to speed up the calculation. Theano essentially compiles numpy expressions, so it only helps when you can write the rhs as expression of multi-dimensional arrays (while jit knows for and friends). It also knows some algebra and optimizes the calculation.
Besides Theano can compile for the GPU (which was my reason to try numba.jit in the first place). However using the GPU turned out to only improve performance for huge systems (maybe one million equations) due to the overhead.
Related
I need to compute many 2D integrations over domains that are simply connected (and convex most of the time). I'm using python function scipy.integrate.nquad to do this integration. However, the time required by this operation is significantly large compared to integration over a rectangular domain. Is there any faster implementation possible?
Here is an example; I integrate a constant function first over a circular domain (using a constraint inside the function) and then on a rectangular domain (default domain of nquad function).
from scipy import integrate
import time
def circular(x,y,a):
if x**2 + y**2 < a**2/4:
return 1
else:
return 0
def rectangular(x,y,a):
return 1
a = 4
start = time.time()
result = integrate.nquad(circular, [[-a/2, a/2],[-a/2, a/2]], args=(a,))
now = time.time()
print(now-start)
start = time.time()
result = integrate.nquad(rectangular, [[-a/2, a/2],[-a/2, a/2]], args=(a,))
now = time.time()
print(now-start)
The rectangular domain takes only 0.00029 seconds, while the circular domain requires 2.07061 seconds to complete.
Also the circular integration gives the following warning:
IntegrationWarning: The maximum number of subdivisions (50) has been achieved.
If increasing the limit yields no improvement it is advised to analyze
the integrand in order to determine the difficulties. If the position of a
local difficulty can be determined (singularity, discontinuity) one will
probably gain from splitting up the interval and calling the integrator
on the subranges. Perhaps a special-purpose integrator should be used.
**opt)
One way to make the calculation faster is to use numba, a just-in-time compiler for Python.
The #jit decorator
Numba provides a #jit decorator to compile some Python code and output optimized machine code that can be run in parallel on several CPU. Jitting the integrand function only takes little effort and will achieve some time saving as the code is optimized to run faster. One doesn't even have to worry with types, Numba does all this under the hood.
from scipy import integrate
from numba import jit
#jit
def circular_jit(x, y, a):
if x**2 + y**2 < a**2 / 4:
return 1
else:
return 0
a = 4
result = integrate.nquad(circular_jit, [[-a/2, a/2],[-a/2, a/2]], args=(a,))
This runs indeed faster and when timing it on my machine, I get:
Original circular function: 1.599048376083374
Jitted circular function: 0.8280022144317627
That is a ~50% reduction of computation time.
Scipy's LowLevelCallable
Function calls in Python are quite time consuming due to the nature of the language. The overhead can sometimes make Python code slow in comparison to compiled languages like C.
In order to mitigate this, Scipy provides a LowLevelCallable class which can be used to provide access to a low-level compiled callback function. Through this mechanism, Python's function call overhead is bypassed and further time saving can be made.
Note that in the case of nquad, the signature of the cfunc passed to LowerLevelCallable must be one of:
double func(int n, double *xx)
double func(int n, double *xx, void *user_data)
where the int is the number of arguments and the values for the arguments are in the second argument. user_data is used for callbacks that need context to operate.
We can therefore slightly change the circular function signature in Python to make it compatible.
from scipy import integrate, LowLevelCallable
from numba import cfunc
from numba.types import intc, CPointer, float64
#cfunc(float64(intc, CPointer(float64)))
def circular_cfunc(n, args):
x, y, a = (args[0], args[1], args[2]) # Cannot do `(args[i] for i in range(n))` as `yield` is not supported
if x**2 + y**2 < a**2/4:
return 1
else:
return 0
circular_LLC = LowLevelCallable(circular_cfunc.ctypes)
a = 4
result = integrate.nquad(circular_LLC, [[-a/2, a/2],[-a/2, a/2]], args=(a,))
With this method I get
LowLevelCallable circular function: 0.07962369918823242
This is a 95% reduction compared to the original and 90% when compared to the jitted version of the function.
A bespoke decorator
In order to make the code more tidy and to keep the integrand function's signature flexible, a bespoke decorator function can be created. It will jit the integrand function and wrap it into a LowLevelCallable object that can then be used with nquad.
from scipy import integrate, LowLevelCallable
from numba import cfunc, jit
from numba.types import intc, CPointer, float64
def jit_integrand_function(integrand_function):
jitted_function = jit(integrand_function, nopython=True)
#cfunc(float64(intc, CPointer(float64)))
def wrapped(n, xx):
return jitted_function(xx[0], xx[1], xx[2])
return LowLevelCallable(wrapped.ctypes)
#jit_integrand_function
def circular(x, y, a):
if x**2 + y**2 < a**2 / 4:
return 1
else:
return 0
a = 4
result = integrate.nquad(circular, [[-a/2, a/2],[-a/2, a/2]], args=(a,))
Arbitrary number of arguments
If the number of arguments is unknown, then we can use the convenient carray function provided by Numba to convert the CPointer(float64) to a Numpy array.
import numpy as np
from scipy import integrate, LowLevelCallable
from numba import cfunc, carray, jit
from numba.types import intc, CPointer, float64
def jit_integrand_function(integrand_function):
jitted_function = jit(integrand_function, nopython=True)
#cfunc(float64(intc, CPointer(float64)))
def wrapped(n, xx):
ar = carray(xx, n)
return jitted_function(ar[0], ar[1], ar[2:])
return LowLevelCallable(wrapped.ctypes)
#jit_integrand_function
def circular(x, y, a):
if x**2 + y**2 < a[-1]**2 / 4:
return 1
else:
return 0
ar = np.array([1, 2, 3, 4])
a = ar[-1]
result = integrate.nquad(circular, [[-a/2, a/2],[-a/2, a/2]], args=ar)
I am playing with the cvxpy library in order to solve some particular optimisation problem
import cvxpy as cp
import numpy as np
(...)
prob = cp.Problem(
cp.Minimize(max(M*theta-b)) <= 45,
[-48 <= theta, theta <= 48])
(Here M and b are certain numpy matrices.)
Interestingly, it screams:
NotImplementedError Traceback (most recent call last)
<ipython-input-62-0296c965b1ff> in <module>
1 prob = cp.Problem(
----> 2 cp.Minimize(max(M*theta-b)) <= 45,
3 [-10 <= theta, theta <= 10])
~\Anaconda3\lib\site-packages\cvxpy\expressions\expression.py in __gt__(self, other)
595 """Unsupported.
596 """
--> 597 raise NotImplementedError("Strict inequalities are not allowed.")
NotImplementedError: Strict inequalities are not allowed.
however, to me, they do not look strict at all...
Same reason as in your earlier question (although things like that are hard to analyze).
You need to ask cvxpy for it's max function explicitly. This is always required / recommended.
cp.Minimize(max(M*theta-b))
should be
cp.Minimize(cp.max(M*theta-b))
You basically have to use only functions from cvxpy, except for the following:
The CVXPY function sum sums all the entries in a single expression. The built-in Python sum should be used to add together a list of expressions.
So I wanted to speed up a program I wrote with the help of numba jit. However jit seems to be not compatible with many scipy functions because they use try ... except ... structures that jit cannot handle (Am I right with this point?)
A relatively simple solution I came up with is to copy the scipy source code I need and delete the try except parts (I already know that it will not run into errors so the try part will always work anyways)
However I do not like this solution and I am not sure if it will work.
My code structure looks like the following
import scipy.integrate as integrate
from scipy optimize import curve_fit
from numba import jit
def fitfunction():
...
#jit
def function(x):
# do some stuff
try:
fit_param, fit_cov = curve_fit(fitfunction, x, y, p0=(0,0,0), maxfev=500)
for idx in some_list:
integrated = integrate.quad(lambda x: fitfunction(fit_param), lower, upper)
except:
fit_param=(0,0,0)
...
Now this results in the following error:
LoweringError: Failed at object (object mode backend)
I assume this is due to jit not being able to handle try except (it also does not work if I only put jit on the curve_fit and integrate.quad parts and work around my own try except structure)
import scipy.integrate as integrate
from scipy optimize import curve_fit
from numba import jit
def fitfunction():
...
#jit
def integral(lower, upper):
return integrate.quad(lambda x: fitfunction(fit_param), lower, upper)
#jit
def fitting(x, y, pzero, max_fev)
return curve_fit(fitfunction, x, y, p0=pzero, maxfev=max_fev)
def function(x):
# do some stuff
try:
fit_param, fit_cov = fitting(x, y, (0,0,0), 500)
for idx in some_list:
integrated = integral(lower, upper)
except:
fit_param=(0,0,0)
...
Is there a way to use jit with scipy.integrate.quad and curve_fit without manually deleting all try except structures from the scipy code?
And would it even speed up the code?
Numba simply is not a general-purpose library to speed code up. There is a class of problems that can be solved in a much faster way with numba (especially if you have loops over arrays, number crunching) but everything else is either (1) not supported or (2) only slightly faster or even a lot slower.
[...] would it even speed up the code?
SciPy is already a high-performance library so in most cases I would expect numba to perform worse (or rarely: slightly better). You might do some profiling to find out if the bottleneck is really in the code that you jitted, then you could get some improvements. But I suspect the bottleneck will be in the compiled code of SciPy and that compiled code is probably already heavily optimized (so it's really unlikely that you find an implementation that could "only" compete with that code).
Is there a way to use jit with scipy.integrate.quad and curve_fit without manually deleting all try except structures from the scipy code?
As you correctly assumed try and except is simply not supported by numba at this time.
2.6.1. Language
2.6.1.1. Constructs
Numba strives to support as much of the Python language as possible, but some language features are not available inside Numba-compiled functions. The following Python language features are not currently supported:
[...]
Exception handling (try .. except, try .. finally)
So the answer here is No.
Nowadays try and except work with numba. However numba and scipy are still not compatible. Yes, Scipy calls compiled C and Fortran, but it does so in a way that numba can't deal with.
Fortunately there are alternatives to scipy that work well with numba! Below I use NumbaQuadpack and NumbaMinpack to do some curve fitting and integration similar to your example code. Disclaimer: i put together these packages. Below, I also give an equivalent implementation in scipy.
The Scipy implementation is ~18 times slower than the Scipy alternatives (NumbaQuadpack and NumbaMinpack).
Using Scipy alternatives (0.23 ms)
from NumbaQuadpack import quadpack_sig, dqags
from NumbaMinpack import minpack_sig, lmdif
import numpy as np
import numba as nb
import timeit
np.random.seed(0)
x = np.linspace(0,2*np.pi,100)
y = np.sin(x)+ np.random.rand(100)
#nb.jit
def fitfunction(x, A, B):
return A*np.sin(B*x)
#nb.cfunc(minpack_sig)
def fitfunction_optimize(u_, fvec, args_):
u = nb.carray(u_,(2,))
args = nb.carray(args_,(200,))
A, B = u
x = args[:100]
y = args[100:]
for i in range(100):
fvec[i] = fitfunction(x[i], A, B) - y[i]
optimize_ptr = fitfunction_optimize.address
#nb.cfunc(quadpack_sig)
def fitfunction_integrate(x, data):
A = data[0]
B = data[1]
return fitfunction(x, A, B)
integrate_ptr = fitfunction_integrate.address
#nb.njit
def fast_function():
try:
neqs = 100
u_init = np.array([2.0,.8],np.float64)
args = np.append(x,y)
fitparam, fvec, success, info = lmdif(optimize_ptr , u_init, neqs, args)
if not success:
raise Exception
lower = 0.0
uppers = np.linspace(np.pi,np.pi*2.0,200)
solutions = np.empty(len(uppers))
for i in range(len(uppers)):
solutions[i], abserr, success = dqags(integrate_ptr, lower, uppers[i], data = fitparam)
if not success:
raise Exception
except:
print('doing something else')
fast_function()
iters = 1000
t_nb = timeit.Timer(fast_function).timeit(number=iters)/iters
print(t_nb)
Using Scipy (4.4 ms)
import scipy.integrate as integrate
from scipy.optimize import curve_fit
import numpy as np
import numba as nb
import timeit
np.random.seed(0)
x = np.linspace(0,2*np.pi,100)
y = np.sin(x)+ np.random.rand(100)
#nb.jit
def fitfunction(x, A, B):
return A*np.sin(B*x)
def function():
try:
p0 = (2.0,.8)
fit_param, fit_cov = curve_fit(fitfunction, x, y, p0=p0, maxfev=500)
lower = 0.0
uppers = np.linspace(np.pi,np.pi*2.0,200)
solutions = np.empty(len(uppers))
for i in range(len(uppers)):
solutions[i], abserr = integrate.quad(fitfunction, lower, uppers[i], args = tuple(fit_param))
except:
print('do something else')
function()
iters = 1000
t_sp = timeit.Timer(function).timeit(number=iters)/iters
print(t_sp)
I keep getting errors when I tried to solve a system of three equations using the following code in python3:
import sympy
from sympy import Symbol, solve, nsolve
x = Symbol('x')
y = Symbol('y')
z = Symbol('z')
eq1 = x - y + 3
eq2 = x + y
eq3 = z - y
print(nsolve( (eq1, eq2, eq3), (x,y,z), (-50,50)))
Here is the error message:
Traceback (most recent call last):
File
"/usr/lib/python3/dist-packages/mpmath/calculus/optimization.py", line
928, in findroot
fx = f(*x0)
TypeError: () missing 1 required positional argument:
'_Dummy_15'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "", line 1, in
File "", line 12, in File
"/usr/lib/python3/dist-packages/sympy/solvers/solvers.py", line 2498,
in nsolve
x = findroot(f, x0, J=J, **kwargs)
File
"/usr/lib/python3/dist-packages/mpmath/calculus/optimization.py", line
931, in findroot
fx = f(x0[0])
TypeError: () missing 2 required positional arguments:
'_Dummy_14' and '_Dummy_15'
The strange thing is, the error message goes away if I only solve the first two equation --- by changing the last line of the code to
print(nsolve( (eq1, eq2), (x,y), (-50,50)))
output:
exec(open('bug444.py').read())
[-1.5]
[ 1.5]
I'm baffled; your help is most appreciated!
A few pieces of additional info:
I'm using python3.4.0 + sympy 0.7.6-3 on ubuntu 14.04. I got the same error in python2
I could solve this system using
solve( [eq1,eq2,eq3], [x,y,z] )
but this system is just a toy example; in the actual applications the system is non-linear and I need higher precision, and I don't see how to adjust the precision for solve, whereas for nsolve I could use nsolve(... , prec=100)
THANKS!
In your print statement, you are missing your guess for z
print(nsolve((eq1, eq2, eq3), (x, y, z), (-50, 50)))
try this (in most cases, using 1 for all the guesses is fine):
print(nsolve((eq1, eq2, eq3), (x, y, z), (1, 1, 1)))
Output:
[-1.5]
[ 1.5]
[ 1.5]
You can discard the initial guesses/dummies if you use linsolve:
>>> from sympy import linsolve
>>> print(linsolve((eq1, eq2, eq3), x,y,z))
{(-3/2, 3/2, 3/2)}
And then you can use nonlinsolve for your non linear problem set.
The Problem is number of variables should be equal to the number of guess vectors,
print(nsolve((eq1, eq2, eq3), (x,y,z), (-50,50,50)))
If you're using a numerical solver on a multidimensional problem, it wants to start from somewhere and follow a gradient to the solution.
the guess vector is where you start.
if there are multiple local minima / maxima in the space, different guess vectors can lead to diffierent outputs.
Or an unfortunate guess vector may not converge at all.
For a one-dimensional problem the guess vector is just x0.
For most functions you can write down easily, almost any vector will converge to the one global solutions.
so (1,1,1) guess vectors here is as good as (-50,50,50)
Just don't leave a null space for the sake of program
your code should be:
nsolve([eq1, eq2, eq3], [x,y,z], [1,1,1])
your code was:
nsolve([eq1, eq2, eq3], [x,y,z], [1,1])
you were mising one guess value in the last argument.
point is: if you are solving for n unknown terms you provide a guess for each unknown term (n guesses in the last argument)
I'm trying to create a distribution based on some data I have, then draw randomly from that distribution. Here's what I have:
from scipy import stats
import numpy
def getDistribution(data):
kernel = stats.gaussian_kde(data)
class rv(stats.rv_continuous):
def _cdf(self, x):
return kernel.integrate_box_1d(-numpy.Inf, x)
return rv()
if __name__ == "__main__":
# pretend this is real data
data = numpy.concatenate((numpy.random.normal(2,5,100), numpy.random.normal(25,5,100)))
d = getDistribution(data)
print d.rvs(size=100) # this usually fails
I think this is doing what I want it to, but I frequently get an error (see below) when I try to do d.rvs(), and d.rvs(100) never works. Am I doing something wrong? Is there an easier or better way to do this? If it's a bug in scipy, is there some way to get around it?
Finally, is there more documentation on creating custom distributions somewhere? The best I've found is the scipy.stats.rv_continuous documentation, which is pretty spartan and contains no useful examples.
The traceback:
Traceback (most recent call last): File "testDistributions.py", line
19, in
print d.rvs(size=100) File "/usr/local/lib/python2.6/dist-packages/scipy-0.10.0-py2.6-linux-x86_64.egg/scipy/stats/distributions.py",
line 696, in rvs
vals = self._rvs(*args) File "/usr/local/lib/python2.6/dist-packages/scipy-0.10.0-py2.6-linux-x86_64.egg/scipy/stats/distributions.py",
line 1193, in _rvs
Y = self._ppf(U,*args) File "/usr/local/lib/python2.6/dist-packages/scipy-0.10.0-py2.6-linux-x86_64.egg/scipy/stats/distributions.py",
line 1212, in _ppf
return self.vecfunc(q,*args) File "/usr/local/lib/python2.6/dist-packages/numpy-1.6.1-py2.6-linux-x86_64.egg/numpy/lib/function_base.py",
line 1862, in call
theout = self.thefunc(*newargs) File "/usr/local/lib/python2.6/dist-packages/scipy-0.10.0-py2.6-linux-x86_64.egg/scipy/stats/distributions.py",
line 1158, in _ppf_single_call
return optimize.brentq(self._ppf_to_solve, self.xa, self.xb, args=(q,)+args, xtol=self.xtol) File
"/usr/local/lib/python2.6/dist-packages/scipy-0.10.0-py2.6-linux-x86_64.egg/scipy/optimize/zeros.py",
line 366, in brentq
r = _zeros._brentq(f,a,b,xtol,maxiter,args,full_output,disp) ValueError: f(a) and f(b) must have different signs
Edit
For those curious, following the advice in the answer below, here's code that works:
from scipy import stats
import numpy
def getDistribution(data):
kernel = stats.gaussian_kde(data)
class rv(stats.rv_continuous):
def _rvs(self, *x, **y):
# don't ask me why it's using self._size
# nor why I have to cast to int
return kernel.resample(int(self._size))
def _cdf(self, x):
return kernel.integrate_box_1d(-numpy.Inf, x)
def _pdf(self, x):
return kernel.evaluate(x)
return rv(name='kdedist', xa=-200, xb=200)
Specifically to your traceback:
rvs uses the inverse of the cdf, ppf, to create random numbers. Since you are not specifying ppf, it is calculated by a rootfinding algorithm, brentq. brentq uses lower and upper bounds on where it should search for the value at with the function is zero (find x such that cdf(x)=q, q is quantile).
The default for the limits, xa and xb, are too small in your example. The following works for me with scipy 0.9.0, xa, xb can be set when creating the function instance
def getDistribution(data):
kernel = stats.gaussian_kde(data)
class rv(stats.rv_continuous):
def _cdf(self, x):
return kernel.integrate_box_1d(-numpy.Inf, x)
return rv(name='kdedist', xa=-200, xb=200)
There is currently a pull request for scipy to improve this, so in the next release xa and xb will be expanded automatically to avoid the f(a) and f(b) must have different signs exception.
There is not much documentation on this, the easiest is to follow some examples (and ask on the mailing list).
edit: addition
pdf: Since you have the density function also given by gaussian_kde, I would add the _pdf method, which will make some calculations more efficient.
edit2: addition
rvs: If you are interested in generating random numbers, then gaussian_kde has a resample method. Random Samples can be generated by sampling from the data and adding gaussian noise. So, this will be faster than the generic rvs using the ppf method. I would write a ._rvs method that just calls gaussian_kde's resample method.
precomputing ppf: I don't know of any general way to precompute the ppf. However, the way I thought of doing it (but never tried so far) is to precompute the ppf at many points and then use linear interpolation to approximate the ppf function.
edit3: about _rvs to answer Srivatsan's question in the comment
_rvs is the distribution specific method that is called by the public method rvs. rvs is a generic method that does some argument checking, adds location and scale, and sets the attribute self._size which is the size of the requested array of random variables, and then calls the distribution specific method ._rvs or it's generic counterpart. The extra arguments in ._rvs are shape parameters, but since there are none in this case, *x and **y are redundant and unused.
I don't know how well the size or shape of the .rvs method works in the multivariate case. These distributions are designed for univariate distributions, and might not fully work for the multivariate case, or might need some reshapes.