what is the best way to have better, dynamic control on the decorators - choosing from numba.cuda.jit, numba.jit and none (pure python). [please note that a project can have 10s or 100s of functions, so this should be easy to apply to all the functions]
here is an example from numba website.
import numba as nb
import numpy as np
# global control of this --> #nb.jit or #nb.cuda.jit or none
# some functions with #nb.jit or cuda.jit with kwargs like (nopython=True, **other_kwargs)
def sum2d(arr):
M, N = arr.shape
result = 0.0
for i in range(M):
for j in range(N):
result += arr[i,j]
return result
a = np.arange(81).reshape(9,9)
sum2d(a)
You may want something more sophisticated, but a relatively simple solution is redefining jit based on settings. For example
def _noop_jit(f=None, *args, **kwargs):
""" returns function unmodified, discarding decorator args"""
if f is None:
return lambda x: x
return f
# some config flag
if settings.PURE_PYTHON_MODE:
jit = _noop_jit
else: # etc
from numba import jit
#jit(nopython=True)
def f(a):
return a + 1
Related
I have a program in Python and I use numba to compile the code to native and run faster.
I want to accelerate the run even further, and implement a cache for function results - if the function is called twice with the same parameters, the first time the calculation will run and return the result and the same time the function will return the result from the cache.
I tried to implement this with a dict, where the keys are tuples containing the function parameters, and the values are the function return values.
However, numba doesn't support dictionaries and the support for global variables is limited, so my solution didn't work.
I can't use a numpy.ndarray and use the indices as the parameters, since some of my parameters are floats.
The problem i that both the function with cached results and and the calling function are compiled with numba (if the calling function was a regular python function, I could cache using just Python and not numba)
How can I implement this result cache with numba?
============================================
The following code gives an error, saying the Memoize class is not recognized
from __future__ import annotations
from numba import njit
class Memoize:
def __init__(self, f):
self.f = f
self.memo = {}
def __call__(self, *args):
if args not in self.memo:
self.memo[args] = self.f(*args)
#Warning: You may wish to do a deepcopy here if returning objects
return self.memo[args]
#Memoize
#njit
def bla(a: int, b: float):
for i in range(1_000_000_000):
a *= b
return a
#njit
def caller(x: int):
s = 0
for j in range(x):
s += bla(j % 5, (j + 1) % 5)
return s
if __name__ == "__main__":
print(caller(30))
The error:
Untyped global name 'bla': Cannot determine Numba type of <class '__main__.Memoize'>
File "try_numba2.py", line 30:
def caller(x: int):
<source elided>
for j in range(x):
s += bla(j % 5, (j + 1) % 5)
^
Changing the order of the decorators for bla gives the following error:
TypeError: The decorated object is not a function (got type <class '__main__.Memoize'>).
Numba documentation specifies that other compiled functions can be inlined and called from other compiled functions. This does not seem to be true when compiling ahead of time.
For example: here are two functions that compute the inner dot product between 2 vector arrays, one of them does the actual product, the other makes the inline call within a loop:
# Module test.py
import numpy as np
from numba import njit, float64
#njit(float64(float64[:], float64[:]))
def product(a, b):
prod = 0
for i in range(a.size):
prod += a[i] * b[i]
return prod
#njit(float64[:](float64[:,:], float64[:,:]))
def n_inner1d(a, b):
prod = np.empty(a.shape[0])
for i in range(a.shape[0]):
prod[i] = product(a[i], b[i])
return prod
As is, I can do import test and use test.n_inner1d perfectly fine. Now lets do some modifications so this can be compiled to a .pyd
# Module test.py
import numpy as np
from numba import float64
from numba.pycc import CC
cc = CC('test')
cc.verbose = True
#cc.export('product','float64(float64[:], float64[:])')
def product(a, b):
prod = 0
for i in range(a.size):
prod += a[i] * b[i]
return prod
#cc.export('n_inner1d','float64[:](float64[:,:], float64[:,:])')
def n_inner1d(a, b):
prod = np.empty(a.shape[0])
for i in range(a.shape[0]):
prod[i] = product(a[i], b[i])
return prod
if __name__ == "__main__":
cc.compile()
When trying to compile, i get the following error:
# python test.py
Failed at nopython (nopython frontend)
Untyped global name 'product': cannot determine Numba type of <type 'function'>
File "test.py", line 20
QUESTION
For a module compiled ahead of time, is it possible for functions defined within to call one another and be used inline?
I reached out to the numba devs and they kindly answered that adding the #njit decorator after #cc.export will make the function call type resolution work and resolve.
So for example:
#cc.export('product','float64(float64[:], float64[:])')
#njit
def product(a, b):
prod = 0
for i in range(a.size):
prod += a[i] * b[i]
return prod
Will make the product function available to others. The caveat being that it is entirely possible in some cases that the inlined function ends up with a different type signature to that of the one declared AOT.
I am trying to do some timing comparisons using numba.
What I don't understand in the following mwe.py is why I get different results
from __future__ import print_function
import numpy as np
from numba import autojit
import time
def timethis(method):
'''decorator for timing function calls'''
def timed(*args, **kwargs):
ts = time.time()
result = method(*args, **kwargs)
te = time.time()
print('{!r} {:f} s'.format(method.__name__, te - ts))
return result
return timed
def pairwise_pure(x):
'''sample function, compute pairwise distancee, see: jakevdp.github.io/blog/2013/06/15/numba-vs-cython-take-2/'''
M, N = x.shape
D = np.empty((M, M), dtype=np.float)
for i in range(M):
for j in range(M):
d = 0.
for k in range(N):
tmp = x[i, k] - x[j, k]
d += tmp * tmp
D[i, j] = np.sqrt(d)
return D
# first version
#timethis
#autojit
def pairwise_numba(args):
return pairwise_pure(args)
# second version
#timethis
def pairwise_numba_alt(args):
return autojit(pairwise_pure)(args)
x = np.random.random((1e3, 10))
pairwise_numba(x)
pairwise_numba_alt(x)
Evaluating python3 mwe.py gives this output:
'pairwise_numba' 5.971631 s
'pairwise_numba_alt' 0.191500 s
In the first version, I decorate the method using timethis to calculate the timings, and with autojit to speed up the code , whereas in the second one I decorate the function with timethis, and call autojit(...) afterwards.
Does someone have an explanation ?
Actually the documentation explicitly states that for optimization each call to other functions "inside" a decorated function should be decorated as well or it isn't optimized.
For many functions like numpy functions that isn't necessary since they are highly optimized but for native python functions it is.
Suppose I have a complex mathematical function with many input parameters P = [p1, ..., pn]. Suppose that I can factor the function in blocks, for example:
f(P) = f1(p1, p2) * f2(p2, ... pn)
and maybe
f2(p2, ..., pn) = p2 * f3(p4) + f4(p5, ..., pn)
suppose I have to evaluate f for many value of P, for example I want to find the minimum of f. Suppose I have already computed f(P) and I need to compute f(P') where P' is equal to P except for p1. In this case I don't have to recompute f2, f3, f4, but only f1.
Is there a library which help me to implement this kind of caching system? I know RooFit, but it is oriented to statistical model, made by blocks. I am looking for something more general. scipy / scikits and similar are preferred, but also c++ libraries are ok. Has this technique a name?
If you can write these function to be pure functions (which means that they always return the same value for the same arguments, and have no side effects), you can use memoization, which is a method for saving results of function calls.
try:
from functools import lru_cache # Python 3.2+
except ImportError: # Python 2
# Python 2 and Python 3.0-3.1
# Requires the third-party functools32 package
from functools32 import lru_cache
#lru_cache(maxsize=None)
def f(arg):
# expensive operations
return result
x = f('arg that will take 10 seconds') # takes 10 seconds
y = f('arg that will take 10 seconds') # virtually instant
For illustration, or if you don't want to use functools32 on Python < 3.2:
def memoize(func):
memo = {}
def wrapper(*args):
if args not in memo:
memo[args] = func(*args)
return memo[args]
return helper
#memoize
def f(arg):
# expensive operations
return result
x = f('arg that will take 10 seconds') # takes 10 seconds
y = f('arg that will take 10 seconds') # virtually instant
I wrote a function "rep" that takes a function f and takes n compositions of f.
So rep(square,3) behaves like this: square(square(square(x))).
And when I pass 3 into it, rep(square,3)(3)=6561.
There is no problem with my code, but I was wondering if there was a way to make it "prettier" (or shorter) without having to call another function or import anything. Thanks!
def compose1(f, g):
"""Return a function h, such that h(x) = f(g(x))."""
def h(x):
return f(g(x))
return h
def rep(f,n):
newfunc = f
count=1
while count < n:
newfunc = compose1(f,newfunc)
count+=1
return newfunc
If you're looking for speed, the for loop is clearly the way to go. But if you're looking for theoretical academic acceptance ;-), stick to terse functional idioms. Like:
def rep(f, n):
return f if n == 1 else lambda x: f(rep(f, n-1)(x))
def rep(f, n):
def repeated(x):
for i in xrange(n):
x = f(x)
return x
return repeated
Using a for loop instead of while is shorter and more readable, and compose1 doesn't really need to be a separate function.
While I agree that repeated composition of the same function is best done with a loop, you could use *args to compose an arbitrary number of functions:
def identity(x):
return x
def compose(*funcs):
if funcs:
rest = compose(*funcs[1:])
return lambda x: funcs[0](rest(x))
else:
return identity
And in this case you would have:
def rep(f,n):
funcs = (f,)*n # tuple with f repeated n times
return compose(*funcs)
And as DSM kindly pointed out in the comments, you could remove the recursion like so:
def compose(*funcs):
if not funcs:
return identity
else:
def composed(x):
for f in reversed(funcs):
x = f(x)
return x
return composed
(also note that you can replace x with *args if you also want to support arbitrary arguments to the functions you're composing, but I left it at one argument since that's how you have it in the original problem)
Maybe someone will find this solution useful
Compose number of functions
from functools import reduce
def compose(*functions):
return reduce(lambda x, y: (lambda arg: x(y(arg))), functions)
Use list comprehensions to generate list of functions
def multi(how_many, func):
return compose(*[func for num in range(how_many)])
Usage
def square(x):
return x * x
multi(3, square)(3) == 6561