Numba jit with scipy - python

So I wanted to speed up a program I wrote with the help of numba jit. However jit seems to be not compatible with many scipy functions because they use try ... except ... structures that jit cannot handle (Am I right with this point?)
A relatively simple solution I came up with is to copy the scipy source code I need and delete the try except parts (I already know that it will not run into errors so the try part will always work anyways)
However I do not like this solution and I am not sure if it will work.
My code structure looks like the following
import scipy.integrate as integrate
from scipy optimize import curve_fit
from numba import jit
def fitfunction():
...
#jit
def function(x):
# do some stuff
try:
fit_param, fit_cov = curve_fit(fitfunction, x, y, p0=(0,0,0), maxfev=500)
for idx in some_list:
integrated = integrate.quad(lambda x: fitfunction(fit_param), lower, upper)
except:
fit_param=(0,0,0)
...
Now this results in the following error:
LoweringError: Failed at object (object mode backend)
I assume this is due to jit not being able to handle try except (it also does not work if I only put jit on the curve_fit and integrate.quad parts and work around my own try except structure)
import scipy.integrate as integrate
from scipy optimize import curve_fit
from numba import jit
def fitfunction():
...
#jit
def integral(lower, upper):
return integrate.quad(lambda x: fitfunction(fit_param), lower, upper)
#jit
def fitting(x, y, pzero, max_fev)
return curve_fit(fitfunction, x, y, p0=pzero, maxfev=max_fev)
def function(x):
# do some stuff
try:
fit_param, fit_cov = fitting(x, y, (0,0,0), 500)
for idx in some_list:
integrated = integral(lower, upper)
except:
fit_param=(0,0,0)
...
Is there a way to use jit with scipy.integrate.quad and curve_fit without manually deleting all try except structures from the scipy code?
And would it even speed up the code?

Numba simply is not a general-purpose library to speed code up. There is a class of problems that can be solved in a much faster way with numba (especially if you have loops over arrays, number crunching) but everything else is either (1) not supported or (2) only slightly faster or even a lot slower.
[...] would it even speed up the code?
SciPy is already a high-performance library so in most cases I would expect numba to perform worse (or rarely: slightly better). You might do some profiling to find out if the bottleneck is really in the code that you jitted, then you could get some improvements. But I suspect the bottleneck will be in the compiled code of SciPy and that compiled code is probably already heavily optimized (so it's really unlikely that you find an implementation that could "only" compete with that code).
Is there a way to use jit with scipy.integrate.quad and curve_fit without manually deleting all try except structures from the scipy code?
As you correctly assumed try and except is simply not supported by numba at this time.
2.6.1. Language
2.6.1.1. Constructs
Numba strives to support as much of the Python language as possible, but some language features are not available inside Numba-compiled functions. The following Python language features are not currently supported:
[...]
Exception handling (try .. except, try .. finally)
So the answer here is No.

Nowadays try and except work with numba. However numba and scipy are still not compatible. Yes, Scipy calls compiled C and Fortran, but it does so in a way that numba can't deal with.
Fortunately there are alternatives to scipy that work well with numba! Below I use NumbaQuadpack and NumbaMinpack to do some curve fitting and integration similar to your example code. Disclaimer: i put together these packages. Below, I also give an equivalent implementation in scipy.
The Scipy implementation is ~18 times slower than the Scipy alternatives (NumbaQuadpack and NumbaMinpack).
Using Scipy alternatives (0.23 ms)
from NumbaQuadpack import quadpack_sig, dqags
from NumbaMinpack import minpack_sig, lmdif
import numpy as np
import numba as nb
import timeit
np.random.seed(0)
x = np.linspace(0,2*np.pi,100)
y = np.sin(x)+ np.random.rand(100)
#nb.jit
def fitfunction(x, A, B):
return A*np.sin(B*x)
#nb.cfunc(minpack_sig)
def fitfunction_optimize(u_, fvec, args_):
u = nb.carray(u_,(2,))
args = nb.carray(args_,(200,))
A, B = u
x = args[:100]
y = args[100:]
for i in range(100):
fvec[i] = fitfunction(x[i], A, B) - y[i]
optimize_ptr = fitfunction_optimize.address
#nb.cfunc(quadpack_sig)
def fitfunction_integrate(x, data):
A = data[0]
B = data[1]
return fitfunction(x, A, B)
integrate_ptr = fitfunction_integrate.address
#nb.njit
def fast_function():
try:
neqs = 100
u_init = np.array([2.0,.8],np.float64)
args = np.append(x,y)
fitparam, fvec, success, info = lmdif(optimize_ptr , u_init, neqs, args)
if not success:
raise Exception
lower = 0.0
uppers = np.linspace(np.pi,np.pi*2.0,200)
solutions = np.empty(len(uppers))
for i in range(len(uppers)):
solutions[i], abserr, success = dqags(integrate_ptr, lower, uppers[i], data = fitparam)
if not success:
raise Exception
except:
print('doing something else')
fast_function()
iters = 1000
t_nb = timeit.Timer(fast_function).timeit(number=iters)/iters
print(t_nb)
Using Scipy (4.4 ms)
import scipy.integrate as integrate
from scipy.optimize import curve_fit
import numpy as np
import numba as nb
import timeit
np.random.seed(0)
x = np.linspace(0,2*np.pi,100)
y = np.sin(x)+ np.random.rand(100)
#nb.jit
def fitfunction(x, A, B):
return A*np.sin(B*x)
def function():
try:
p0 = (2.0,.8)
fit_param, fit_cov = curve_fit(fitfunction, x, y, p0=p0, maxfev=500)
lower = 0.0
uppers = np.linspace(np.pi,np.pi*2.0,200)
solutions = np.empty(len(uppers))
for i in range(len(uppers)):
solutions[i], abserr = integrate.quad(fitfunction, lower, uppers[i], args = tuple(fit_param))
except:
print('do something else')
function()
iters = 1000
t_sp = timeit.Timer(function).timeit(number=iters)/iters
print(t_sp)

Related

How to run NumPy code on the GPU with Numba

I've developed several codes with high matrix implication; those run quite well but since I spent some extra money in GPU, I'd like take advantage from it... ;-) I've tryed in several configurations from numba manual, but clearly something is missing .. Why I receive an error when try to execute some functions in Cuda (nvidia gtx 1050)?
Would be numba or eighter other module, how can I let execute a portion of code (which require strong parallelization) into GPU ?
from numba import types
from numba.extending import intrinsic
from numba import jit, cuda
from numba import vectorize
#jit
def calculate_portfolio_return(returns, weights):
portfolio_return = np.sum(returns.mean()*weights)*252
print("Expected Portfolio Return:", portfolio_return)
#jit
def calculate_portfolio_risk(returns, weights):
portfolio_variance = np.sqrt(np.dot(weights.T, np.dot(returns.cov()*252,weights)))
print("Expected Risk:", portfolio_variance)
#jit
def generate_portfolios(weights, returns):
preturns = []
pvariances = []
for i in range(10000):
# weights = np.random.random(len(stocks_portfolio))
weights = np.random.random(data.shape[1])
weights/=np.sum(weights)
preturns.append(np.sum(returns.mean()*weights)*252)
pvariances.append(np.sqrt(np.dot(weights.T,np.dot(returns.cov()*252,weights))))
preturns = np.array(preturns)
pvariances = np.array(pvariances)
return preturns,pvariances
......

How to reduce integration time for integration over 2D connected domains

I need to compute many 2D integrations over domains that are simply connected (and convex most of the time). I'm using python function scipy.integrate.nquad to do this integration. However, the time required by this operation is significantly large compared to integration over a rectangular domain. Is there any faster implementation possible?
Here is an example; I integrate a constant function first over a circular domain (using a constraint inside the function) and then on a rectangular domain (default domain of nquad function).
from scipy import integrate
import time
def circular(x,y,a):
if x**2 + y**2 < a**2/4:
return 1
else:
return 0
def rectangular(x,y,a):
return 1
a = 4
start = time.time()
result = integrate.nquad(circular, [[-a/2, a/2],[-a/2, a/2]], args=(a,))
now = time.time()
print(now-start)
start = time.time()
result = integrate.nquad(rectangular, [[-a/2, a/2],[-a/2, a/2]], args=(a,))
now = time.time()
print(now-start)
The rectangular domain takes only 0.00029 seconds, while the circular domain requires 2.07061 seconds to complete.
Also the circular integration gives the following warning:
IntegrationWarning: The maximum number of subdivisions (50) has been achieved.
If increasing the limit yields no improvement it is advised to analyze
the integrand in order to determine the difficulties. If the position of a
local difficulty can be determined (singularity, discontinuity) one will
probably gain from splitting up the interval and calling the integrator
on the subranges. Perhaps a special-purpose integrator should be used.
**opt)
One way to make the calculation faster is to use numba, a just-in-time compiler for Python.
The #jit decorator
Numba provides a #jit decorator to compile some Python code and output optimized machine code that can be run in parallel on several CPU. Jitting the integrand function only takes little effort and will achieve some time saving as the code is optimized to run faster. One doesn't even have to worry with types, Numba does all this under the hood.
from scipy import integrate
from numba import jit
#jit
def circular_jit(x, y, a):
if x**2 + y**2 < a**2 / 4:
return 1
else:
return 0
a = 4
result = integrate.nquad(circular_jit, [[-a/2, a/2],[-a/2, a/2]], args=(a,))
This runs indeed faster and when timing it on my machine, I get:
Original circular function: 1.599048376083374
Jitted circular function: 0.8280022144317627
That is a ~50% reduction of computation time.
Scipy's LowLevelCallable
Function calls in Python are quite time consuming due to the nature of the language. The overhead can sometimes make Python code slow in comparison to compiled languages like C.
In order to mitigate this, Scipy provides a LowLevelCallable class which can be used to provide access to a low-level compiled callback function. Through this mechanism, Python's function call overhead is bypassed and further time saving can be made.
Note that in the case of nquad, the signature of the cfunc passed to LowerLevelCallable must be one of:
double func(int n, double *xx)
double func(int n, double *xx, void *user_data)
where the int is the number of arguments and the values for the arguments are in the second argument. user_data is used for callbacks that need context to operate.
We can therefore slightly change the circular function signature in Python to make it compatible.
from scipy import integrate, LowLevelCallable
from numba import cfunc
from numba.types import intc, CPointer, float64
#cfunc(float64(intc, CPointer(float64)))
def circular_cfunc(n, args):
x, y, a = (args[0], args[1], args[2]) # Cannot do `(args[i] for i in range(n))` as `yield` is not supported
if x**2 + y**2 < a**2/4:
return 1
else:
return 0
circular_LLC = LowLevelCallable(circular_cfunc.ctypes)
a = 4
result = integrate.nquad(circular_LLC, [[-a/2, a/2],[-a/2, a/2]], args=(a,))
With this method I get
LowLevelCallable circular function: 0.07962369918823242
This is a 95% reduction compared to the original and 90% when compared to the jitted version of the function.
A bespoke decorator
In order to make the code more tidy and to keep the integrand function's signature flexible, a bespoke decorator function can be created. It will jit the integrand function and wrap it into a LowLevelCallable object that can then be used with nquad.
from scipy import integrate, LowLevelCallable
from numba import cfunc, jit
from numba.types import intc, CPointer, float64
def jit_integrand_function(integrand_function):
jitted_function = jit(integrand_function, nopython=True)
#cfunc(float64(intc, CPointer(float64)))
def wrapped(n, xx):
return jitted_function(xx[0], xx[1], xx[2])
return LowLevelCallable(wrapped.ctypes)
#jit_integrand_function
def circular(x, y, a):
if x**2 + y**2 < a**2 / 4:
return 1
else:
return 0
a = 4
result = integrate.nquad(circular, [[-a/2, a/2],[-a/2, a/2]], args=(a,))
Arbitrary number of arguments
If the number of arguments is unknown, then we can use the convenient carray function provided by Numba to convert the CPointer(float64) to a Numpy array.
import numpy as np
from scipy import integrate, LowLevelCallable
from numba import cfunc, carray, jit
from numba.types import intc, CPointer, float64
def jit_integrand_function(integrand_function):
jitted_function = jit(integrand_function, nopython=True)
#cfunc(float64(intc, CPointer(float64)))
def wrapped(n, xx):
ar = carray(xx, n)
return jitted_function(ar[0], ar[1], ar[2:])
return LowLevelCallable(wrapped.ctypes)
#jit_integrand_function
def circular(x, y, a):
if x**2 + y**2 < a[-1]**2 / 4:
return 1
else:
return 0
ar = np.array([1, 2, 3, 4])
a = ar[-1]
result = integrate.nquad(circular, [[-a/2, a/2],[-a/2, a/2]], args=ar)

Is it possible to vectorize scipy fsolve?

I know how to use fsolve in scipy
from scipy.optimize import fsolve
import numpy as np
k=4
def equations(p):
x,y=p
eq_1 = x+y-k
eq_2 = x-y
return (eq_1,eq_2)
fsolve(equations, (0,0))
But I don't know how to vectorize this function if k is a numpy array.
(I could do it in a loop for different value of k, but if will take a lot of time).
Is there a solution like :
from scipy.optimize import fsolve
import numpy as np
k=np.arange(10000)
def equations(p):
x,y=p
eq_1 = x+y-k
eq_2 = x-y
return (eq_1,eq_2)
fsolve(equations, (np.zeros(10000),np.zeros(10000)))
Thank's a lot if you have any idea
EDIT :
The link that some of you give below increases the calculation time as each line are not necessary independant.
For example, you can test those two code :
s=1000
#With array
t0=time.time()
k=np.arange(s)
def equations(p):
eq_1 = p**2-k
return (eq_1)
res=fsolve(equations, np.ones((s)))
print(time.time()-t0)
#With loop
t0=time.time()
res=np.zeros((s))
i=0
def equations(p):
eq_1 = p**2-i
return (eq_1)
while i<s:
res[i]=fsolve(equations, 1)[0]
i+=1
print(time.time()-t0)
Result
10.85175347328186
0.05588793754577637
Is there a way to avoid a loop but to keep a good speed with vectorize function
Not directly.
There is cython_optimize interface though, https://docs.scipy.org/doc/scipy/reference/optimize.cython_optimize.html
So you can drop to cython and implement the loops manually. YMMV though

Numerical integration for matrix values in Python

I am trying to integrate over some matrix entries in Python. I want to avoid loops, because my tasks includes 1 Mio simulations. I am looking for a specification that will efficiently solve my problem.
I get the following error: only size-1 arrays can be converted to Python scalars
from scipy import integrate
import numpy.random as npr
n = 1000
m = 30
x = npr.standard_normal([n, m])
def integrand(k):
return k * x ** 2
integrate.quad(integrand, 0, 100)
This is a simplied example of my case. I have multiple nested functions, such that I cannot simple put x infront of the integral.
Well you might want to use parallel execution for this. It should be quite easy as long as you just want to execute integrate.quad 30000000 times. Just split your workload in little packages and give it to a threadpool. Of course the speedup is limited to the number of cores you have in your pc. I'm not a python programer but this should be possible. You can also increase epsabs and epsrel parameters in the quad function, depending on the implemetation this should speed up the programm as well. Of course you'll get a less precise result but this might be ok depending on your problem.
import threading
from scipy import integrate
import numpy.random as npr
n = 2
m = 3
x = npr.standard_normal([n,m])
def f(a):
for j in range(m):
integrand = lambda k: k * x[a,j]**2
i =integrate.quad(integrand, 0, 100)
print(i) ##write it to result array
for i in range(n):
threading.Thread(target=f(i)).start();
##better split it up even more and give it to a threadpool to avoid
##overhead because of thread init
This is maybe not the ideal solution but it should help a bit. You can use numpy.vectorize. Even the doc says: The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop. But still, a %timeit on the simple example you provided shows a 2.3x speedup.
The implementation is
from scipy import integrate
from numpy import vectorize
import numpy.random as npr
n = 1000
m = 30
x = npr.standard_normal([n,m])
def g(x):
integrand = lambda k: k * x**2
return integrate.quad(integrand, 0, 100)
vg = vectorize(g)
res = vg(x)
quadpy (a project of mine) does vectorized quadrature:
import numpy
import numpy.random as npr
import quadpy
x = npr.standard_normal([1000, 30])
def integrand(k):
return numpy.multiply.outer(x ** 2, k)
scheme = quadpy.line_segment.gauss_legendre(10)
val = scheme.integrate(integrand, [0, 100])
This is much faster than all other answers.

Optimizing root finding algorithm from scipy

I use the root function from scipy.optimize with the method "excitingmixing" in my code because other methods, like standard Newton, don't converge to the roots I am looking for.
However I would like to optimize my code using numba, which doesn't support the scipy package. I tried to look up the "exciting mixing" algorithm in the documentation to program it myself:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.root.html
I didn't find anything useful except the not really helpful statement that the method "uses a tuned diagonal Jacobian approximation".
I would be glad if someone could tell me something about the algorithm or has an idea on how to optimize the scipy function in an other way.
As requested here is a minimal code example:
import numpy as np
from scipy import optimize
from numba import jit
#jit(nopython = True)
def func(x):
[a, b, c, d] = x
da = a*(1-b)
db = b*(1-c)
dc = c
dd = 1
return [da, db, dc, dd]
#jit(nopython = True)
def getRoot(x0):
solution = optimize.root(func, x0, method="excitingmixing")
return(solution.x)
root = getRoot([0.1,0.1,0.2,0.4])
print(root)
You can look in the source code of scipy to see the implementation of the excitingmixing option:
https://github.com/scipy/scipy/blob/c948e96ebb3454f6a82e9d14021cc601d7ce7a85/scipy/optimize/nonlin.py#L1272
You're likely not going to want to reimplement the entire root finding algorithm in numba. I better strategy you can test is to use numba to optimize the function that you pass to the scipy method. You're still going to pay some overhead of scipy calling a function, but you might see a performance increase if the bottleneck is evaluating the function and that can be done faster with a numba jitted version. I've found it best to just experiment with numba and test with the timeit method.
I wrote a little wrapper Minpack, called NumbaMinpack, which can be called within numba compiled functions: https://github.com/Nicholaswogan/NumbaMinpack.
You should try the lmdif method, if Newton's method is failing you.
from NumbaMinpack import lmdif, hybrd, minpack_sig
from numba import njit, cfunc
import numpy as np
#cfunc(minpack_sig)
def myfunc(x, fvec, args):
fvec[0] = x[0]**2 - args[0]
fvec[1] = x[1]**2 - args[1]
funcptr = myfunc.address # pointer to myfunc
x_init = np.array([10.0,10.0]) # initial conditions
neqs = 2 # number of equations
args = np.array([30.0,8.0]) # data you want to pass to myfunc
#njit
def test():
# solve with lmdif
sol = lmdif(funcptr, x_init, neqs, args)
# OR solve with hybrd
sol = hybrd(funcptr, x_init, args)
return sol
test() # it works!

Categories