Calling Python function from Cython code, inside a Google Colab notebook - python

I am working in a Google Colab notebook. There is one particular, computationally intensive piece of code that I'm doing using Cython in the same notebook. Within this piece of code, I want to call a function (defined in another cell in the same notebook, in Python).
Now, that function is heavily integrated with the rest of my pure Python code and rewriting and redefining things for Cython would not be possible.
My question is: How do I call that function written in Python, from another cell that is getting compiled in Cython?
Link I have already looked at:
Call python file with python imports from C using cython

Normally, you would put the whole functionality into a module and import it in the %%cython-cell.
Another less clean (but in case of a notebook probably acceptable) way would be to import from __main__, e.g.:
[1]: def foo():
print("I'm main foo")
and then:
[2]: %%cython
def usefoo():
from __main__ import foo
foo()
and now:
[3]: usefoo()
I'm main foo
Another variant would be to import foo from __main__ slightly differently:
[2]: %%cython
from __main__ import foo
def usefoo2():
foo()
There are two main differences:
if foo isn't (yet) defined in __main__, second %%cython-cell will fail. First version will fail if foo is not or no longer defined during the call of the function usefoo.
if foo is changed in __main__, the first version will use the current version while the second version will always use the version from the moment %%cython-cell built (which might not be the same time the %%cython-cell is run due to caching). This can be quite confusing.
In the long run, this way is quite confusing and puzzling, so after short try-out phase I would change to a more sustainable approach using dedicated modules.

Related

Python - Performance difference between importing a function and locally declaring it?

Is there a significant difference in performance between importing a function versus declaring it in the current file in Python?
I have a small function (one-liner) that I use often in several .py files in my program. I wish to instead define it once so that changes I make to it are reflected everywhere. However, I am not sure whether using it as an imported function will add additional overhead when calling it...
I doubt there should be a difference between a call to a function locally declared vs a function which is imported. Although, there is a small difference between a line of code being executed vs a function called for the same code to be executed. This should help in case I was a bit confusing with my wording.
Hi Jet Blue for better understanding go with the python wiki PerformanceTips
import statements can be executed just about anywhere. It's often useful to place them inside functions to restrict their visibility and/or reduce initial startup time. Although Python's interpreter is optimized to not import the same module multiple times, repeatedly executing an import statement can seriously affect performance in some circumstances.
Consider the following two snippets of code (originally from Greg McFarlane, I believe - I found it unattributed in a comp.lang.python python-list#python.org posting and later attributed to him in another source):
def doit1():
import string ###### import statement inside function
string.lower('Python')
for num in range(100000):
doit1()
or:
import string ###### import statement outside function
def doit2():
string.lower('Python')
for num in range(100000):
doit2()
doit2 will run much faster than doit1, even though the reference to the string module is global in doit2. Here's a Python interpreter session run using Python 2.3 and the new timeit module, which shows how much faster the second is than the first:
def doit1():
import string
string.lower('Python')
import string
def doit2():
string.lower('Python')
import timeit
t = timeit.Timer(setup='from __main__ import doit1', stmt='doit1()')
t.timeit()
11.479144930839539
t = timeit.Timer(setup='from __main__ import doit2', stmt='doit2()')
t.timeit()
4.6661689281463623
String methods were introduced to the language in Python 2.0. These provide a version that avoids the import completely and runs even faster:
def doit3():
'Python'.lower()
for num in range(100000):
doit3()
Here's the proof from timeit:
def doit3():
'Python'.lower()
t = timeit.Timer(setup='from main import doit3', stmt='doit3()')
t.timeit()
2.5606080293655396
The above example is obviously a bit contrived, but the general principle holds.
Note that putting an import in a function can speed up the initial loading of the module, especially if the imported module might not be required. This is generally a case of a "lazy" optimization -- avoiding work (importing a module, which can be very expensive) until you are sure it is required.
This is only a significant saving in cases where the module wouldn't have been imported at all (from any module) -- if the module is already loaded (as will be the case for many standard modules, like string or re), avoiding an import doesn't save you anything. To see what modules are loaded in the system look in sys.modules.
A good way to do lazy imports is:
email = None
def parse_email():
global email
if email is None:
import email
This way the email module will only be imported once, on the first invocation of parse_email().

Cython and Python class coexistence in ipython?

What is the best approach to have the same classes coexist in ipython environment.
I want to gradually migrate some classes in hierarchy (starting from the root classes) from Python to Cython.
I want to be able to have both versions running (in ipython env), so I can compare performance, and be able to fall-back when needed.
Is there some other approach that may work even if not exactly the way I want.
For my current experiments I started with renaming the classes and importing them separately.
F.e. :
import pyximport; pyximport.install()
from blah import *
from c_blah import *
b = Blah()
cb = c_Blah()
%timeit -n 1000 b.op()
%timeit -n 1000 cb.op()
that is cumbersome because I had to rename all class-attr/method accesses.
Also this does not solve my dilemma once I go down the hierarchy.
Any other ideas how to approach this ?
I mean incremental recoding in Cython.
cswiercz makes good point :
from blah import Blah
from cblah import Blah as cBlah
this is OK, but supporting hierarchy will require modifications.
Will keep it open for other ideas.
You can use the import x as y method for renaming definitions within modules. For example, if foo.py contains a function func and bar.pyx also contains the function func (perhaps this is you trying to write a Cython version of foo.func()) then you can do the following in your timing script:
# separate imports
from foo import func as func_py
from bar import func as func_cy
# use the code
func_py(2.0)
func_cy(2.0)
This way you can keep a meaningful naming scheme within the foo and bar modules.

Python callbacks into the main module (functions and variables)

I am trying to have a module that provides a framework of sorts to short declarative one-file scripts that I write. In the short scripts, I would like to define a number of variables and functions that are accessed or called back by the framework module.
I am stuck ... I have tried two approaches, and I am not crazy about either one. Is there a better way?
First approach:
Framework module:
import __main__
def test():
print __main__.var
print __main__.func()
Script:
import framework
var="variable"
def func():
print "function"
framework.test()
I like this approach and it works, but eclipse gives me an 'Undefined variable from import: var' error on any imported variable or function from main. Obviously, this is not correct as it works, but it clutters up eclipse with many false errors.
Second Approach:
Framework module:
def test(glob):
print glob['var']
print glob['func']()
Script:
import framework
var="variable"
def func():
print "function"
framework.test(globals())
This seems to work and does not give me errors, but I don't like the dictionary type notation used to call variables and functions ... especially for functions: func'funcName'.
Is there a better way to implement this that results in clean code? Can I pass in a module name (ie main) as an argument to a function? Can Eclipse errors on main for the first approach be turned off?
I went with the first approach and told Eclipse to ignore the Undefined Variable from Import error by pressing command+1 (for my Mac, for Windows it would be ctrl+1) on each line where this error occurred.
command+1 adds the following comment to the end of each line:
# #UndefinedVariable
Thanks to #ekhumoro for pointing this out in the comments.

Struggling with Python timeit

I'm struggling with the timeit function in Python, and, on a deeper level, I find myself very frustrated by the quirks of this function. I'm hoping I can get some help with both issues here.
I have a script (call it my_script.py) with a lot of different function definitions, and then a lot of other stuff being calculated below them all. I want to time only one of these functions in particular - let's call it level_99_function(x). I have a big array stored in my_input. My first attempt:
timeit.timeit('f1(x)', setup = 'my_input')
Python returns the error: NameError: global name 'angle' is not defined.
Now my second attempt is to do the following:
print timeit.timeit('level_99_function(x)', setup = 'import numpy as np; import my_script.py; x= np.linspace(0,100)')
This doesn't generate any errors, but the problem is two-fold. First, and most importantly, it still doesn't time the level_99_function (or maybe it just doesn't print to output of the timer for whatever reason?) Second, the import statement seems to be running the entire script on import, which takes forever because of all the stuff I've got in this script aside from my level_99_function.
How do I get the timing of the function in question here? And on a more philosophical level, why is this such a struggle in Python? I've already got a variable and a function defined; all I want to do is time that function call with that variable. It would be nice to not have to write a super long line of code, or write multiple lines of code, or have to import things or any of that stuff. It's as easy as tic and toc in Matlab. I guess the corresponding Python commands would be to use 'time.clock()' before and after the function call, but I've read that this can be inaccurate and misleading.
You don't need to import numpy everytime within setup, instead you can import the function and variables you want from the current script with from __main__ import ... as shown in the example below.
import timeit
import numpy as np
def func1(x):
pass
def func2(x):
pass
def func3(x):
return np.array(x > 1000)
if __name__ == '__main__':
x = np.arange(10000)
time = timeit.timeit('func3(x)', setup='from __main__ import func3, x', number=1000)
print(time)
The if __name__ == '__main__' block will prevent the code within the if statement from being ran if you import the code from another script, meaning you won't accidentally run your timing tests if you import your functions.
This code only imports func3 and x. I'm only interested in func3 (not func1 and func2) and I've defined a value to test with (I call it x but it's equivalent to your my_input). You don't need to import numpy in this case.
I would however completely and utterly advise you to take roippi's comment into consideration and use IPython. The %timeit magic method is very, very useful.
As an FYI for the future:
I recently submitted a patch against issue2527, which was committed a few days ago to the default branch. So whenever 3.5 is publicly released, you can do this:
timeit.timeit('level_99_function(x)', globals=globals())
Not quite as awesome as iPython's %timeit, I know, but far better than the from __main__ import ... nonsense that you have to do right now. More info in the docs.

Reloading a changed python file in emacs python shell

In the emacs Python shell (I'm running 2.* Python) I am importing a .py file I'm working with and testing the code. If I change the code however I'm not sure how to import it again.
From my reading so far it seems that
reload(modulename)
should work, but it doesn't seem to.
Perhaps just shutting down the python shell and restarting it would be enough, is there a command for that or you just do it manually?
edit: It looks like python-send-defun and python-send-buffer would be ideal, but changes don't seem to be propagating.
While reload() does work, it doesn't change references to classes, functions, and other objects, so it's easy to see an old version. A most consistent solution is to replace reload() with either exec (which means not using import in the first place) or restarting the interpreter entirely.
If you do want to continue to use reload, be very careful about how you reference things from that module, and always use the full name. E.g. import module and use module.name instead of from module import name. And, even being careful, you will still run into problems with old objects, which is one reason reload() isn't in 3.x.
It seems to work for me:
Make a file (in your PYTHONPATH) called test.py
def foo():
print('bar')
Then in the emacs python shell (or better yet, the ipython shell), type
>>> import test
>>> test.foo()
bar
Now modify test.py:
def foo():
print('baz')
>>> reload(test)
<module 'test' from '/home/unutbu/pybin/test.py'>
>>> test.foo()
baz
After looking at this issue for quite an extensive amount of time, I came to the conclusion that the best solution to implement is, either based on an initialisation file of your python interpreter (ipython for exemple), or using the python build-in module "imp" and its function "reload". For instance at the beginning of your code:
import my_module
import imp
imp.reload(my_module)
#your code
This solution came to me from this page: https://emacs.stackexchange.com/questions/13476/how-to-force-a-python-shell-to-re-import-modules-when-running-a-buffer

Categories