How to import a cached numba function without a Python definition - python

Consider the following function in numba, which just serves as an example:
import numba as nb
import numpy as np
#nb.njit('float64(float64[::1])', cache=True)
def total (x):
''' Sum the elements of an array. '''
total = 0
for i in range(x.shape[0]):
total += x[i]
return total
x = np.arange(100,dtype=np.float64)
print(total(x))
Since I have specified the cache=True option, two files are created in the __pycache__ folder, one .nbc file and one .nbi file. I assume that these files contain all (compiled) information about the function. Let's say I delete the Python file that defines the function (i..e, the above code).
Can I still use compiled/cached functions? In other words, can I import them without having the original .py file that defined them?

#Michael Szczesny's comment about ahead of time compilation is exactly what I wanted. To make it work, I have adapted the code as follows:
from numba.pycc import CC
cc = CC('my_module')
#cc.export('total', 'float64(float64[::1])')
def total (x):
''' Sum the elements of an array. '''
total = 0
for i in range(x.shape[0]):
total += x[i]
return total
if __name__ == "__main__":
cc.compile()
After running this file, a binary module file (.pyd) is saved in the directory and you can use it as follows:
import numpy as np
from my_module import total
x = np.arange(100,dtype=np.float64)
print(total(x))

Related

Redefine global JIT function and maintain performance

I'm in the process of developing a PDE package (similar to fenics) and am trying to come up with the best way for a user to define their functions. The current method I am using is fine. However, as the package grows the run-time compilation of the functions will take much longer.
The current method takes in a dictionary of function specs and redefines and recompiles the global default. As the package gets larger, it would be ideal if I could have everything compiled ahead of time; the problem is for numba's pycc AOT compilation, you can't pass in a function because there is no recognizable signature type (that I could find in the documentaion). I have also tried using cython but the function stays as a weak reference to a pyobject which ruins the performance.
The code in the module where the function exists looks like
import numpy as np
from numba import njit
#njit
def _user_defined_function(z, t):
return -z
def define_vector_func(func_specs, **numba_kwargs):
# func_specs is a dict, numba_kwargs are for jit decorator
# create a string to be compiled into a new function
exprs = []
for var in func_specs:
s = func_specs[var]
for i, (variable, expression) in enumerate(func_specs.items()):
s = s.replace(variable, f'z[{i}]')
exprs.append(s)
fp = '<ipython-cache-safe>' # needed to allow for caching
src = ', '.join(exprs)
src = f'def local_func(z, t): return np.array([{src}])\n'
lines = [src]
linecache.cache[fp] = (len(src), None, lines, fp)
code = compile(src, fp, 'exec')
exec(code)
# redefine global function and return a reference if user wants to use it
local_func_ptr = vars()['local_func']
local_func_ptr(np.ones(len(func_specs)), 0.0)
global _user_defined_function
_user_defined_function = njit(**numba_kwargs)(local_func_ptr)
_rec() # recompile other functions to recognize new function
return _user_defined_function
# rest of module
# ...
To use it from a separate module, you would do something like
from module_above import define_vector_func, other_funcs_from_module
if __name__ == '__main__':
func_spec = {'v': 'v - (v * v * v) / 3.0 - w + 0.08',
'w': '0.08 * (v - 0.8 * w + 0.7)'}
f = define_vector_func(func_spec, fastmath=True, cache=True)
# do what user wants
What would be the best route for compiling the module ahead of time and maintain performance?

using functions with variables from other python files at notebook's directory

For example, in my folder, I have my ipython notebook "program.ipynb" and a python file "functions.py" which has some functions in it, for example, "func"
from numpy import sqrt
def func(x):
return N + sqrt(x)
that is going to be used in "program.ipynb" which looks like that
from functions import func
N = 5
func(2)
--> name 'N' is not defined
To fix the bug i need to define the variable N in my functions.py file but isn't there a way around? I want to define all my global variables in my main programm (program.ipynb).
You can't access a variable like that, the best way would be:
functions.py
from numpy import sqrt
def func(x, N):
return N + sqrt(x)
program.ipynb
from functions import func
N = 5
func(2, N)

Python: cannot import name x for importing module

** EDIT: Copy-pasting my actual file to ease confusion. The code snippet below is in a file named train_fm.py:
def eval_fm(x,b,w,V):
# evaluate a degree 2 FM. x is p X B
# V is p x k
# some python code that computes yhat
return(yhat);
Now in my main file: I say the following
from train_fm import eval_fm
and I get the error:
ImportError: cannot import name f1
When I type
from train_fm import train_fm
I do not get an error.
OLD QUESTION BELOW :
def train_fm(x,y,lb,lw,lv,k,a,b,w,V):
# some code
yhat = eval_fm(x,b,w,V);
# OUTPUTS
return(b,w,V);
I have a file called f2.py, where I define 2 functions (note that one of the functions has the same name as the file)
def f1():
some stuff;
return(stuff)
def f2():
more stuff;
y = f1();
return(y)
In my main file, I do
from aaa import f1
from aaa import f2
but when I run the first of the 2 commands above, I get
ImportError: cannot import name f1
Any idea what is causing this? The second function gets imported fine.

return value to vba from xlwings

Let's use the example on xlwings documentation.
Given the following python code:
import numpy as np
from xlwings import Workbook, Range
def rand_numbers():
""" produces std. normally distributed random numbers with shape (n,n)"""
wb = Workbook.caller() # Creates a reference to the calling Excel file
n = int(Range('Sheet1', 'B1').value) # Write desired dimensions into Cell B1
rand_num = np.random.randn(n, n)
Range('Sheet1', 'C3').value = rand_num
This is the original example.
Let's say we modify it slightly to be:
import numpy as np
from xlwings import Workbook, Range
def rand_numbers():
""" produces std. normally distributed random numbers with shape (n,n)"""
wb = Workbook.caller() # Creates a reference to the calling Excel file
n = int(Range('Sheet1', 'B1').value) # Write desired dimensions into Cell B1
rand_num = np.random.randn(n, n)
return rand_num #modified line
And we call it from VBA using the following call:
Sub MyMacro()
dim z 'new line'
z = RunPython ("import mymodule; mymodule.rand_numbers()")
End Sub
We get z as an empty value.
Is there any way to return a value to vba directly without writing to a text file, or putting the value first in the excel document?
Thank you for any pointers.
RunPython does not allow you to return values as per xlwings documentation.
To overcome these issue, use UDFs, see VBA: User Defined Functions (UDFs) - however, this is currently limited to Windows only.
https://docs.xlwings.org/en/stable/udfs.html#udfs

python itertools product slow is the write speed to a output file a bottleneck

I have a simple python function performing itertools product function. As seen below.
def cart(n, seq):
import itertools
b = 8
while b < n:
n = n - 1
for p in itertools.product(seq, repeat=n):
file.write(''.join(p))
file.write('\n')
The function works but it is extremely slow. It is not even using a noticeable amount of resources. I was wondering if the bottle neck was the disk write speed? currently the the script is averaging at 2.5 mb per second. I also attempted this to a solid state drive and recieved the same speeds which leads me to believe the write speed is not the bottle neck. Is there a way to speed this function up and use more system resources? or is itertools just slow? Forgive me I am new to python.
You can profile your code to get an idea of the location of the bottleneck. The following will create a file called "cart_stats.txt" with the profiling information in it. Running it myself seems to indicate that most of the time is being spent calling file.write().
from cProfile import Profile
from pstats import Stats
prof = Profile()
prof.disable()
file = open('cart_output.txt', 'wt')
def cart(n, seq):
import itertools
b = 8
while b < n:
n = n - 1
for p in itertools.product(seq, repeat=n):
file.write(''.join(p))
file.write('\n')
prof.enable()
cart(10, 'abc')
prof.disable()
prof.dump_stats('cart.stats')
with open('cart_stats.txt', 'wt') as output:
stats = Stats('cart.stats', stream=output)
stats.sort_stats('cumulative', 'time')
stats.print_stats()
file.close()
print 'done'
FWIW, the slowness seems to be overwhelmingly due the calls to file.write() itself, because it's still there even if I open() the output stream with a huge buffer or make it a StringIO instance. I was able to significantly reduce that by optimizing and minimizing the calls to it as shown below:
def cart(n, seq):
import itertools
b = 8
write = file.write # speed up lookup of method
while b < n:
n = n - 1
for p in itertools.product(seq, repeat=n):
write(''.join(p)+'\n') # only call it once in loop
Which proves that having a profiler in place can be the best way to know where to spend your time and get the most benefit.
Update:
Here's a version that stores all output generated in memory before making a single file.write() call. It is significantly faster than using StringIO.StringIO because it's less general, but however is still not quite as fast as using a cStringIO.StringIO instance.
file = open('cart_output.txt', 'wt')
def cart(n, seq):
from itertools import product
buflist = []
append = buflist.append
b = 8
while b < n:
n = n - 1
for p in product(seq, repeat=n):
append(''.join(p))
file.write('\n'.join(buflist)+'\n')
file.close()

Categories