What is the best approach to have the same classes coexist in ipython environment.
I want to gradually migrate some classes in hierarchy (starting from the root classes) from Python to Cython.
I want to be able to have both versions running (in ipython env), so I can compare performance, and be able to fall-back when needed.
Is there some other approach that may work even if not exactly the way I want.
For my current experiments I started with renaming the classes and importing them separately.
F.e. :
import pyximport; pyximport.install()
from blah import *
from c_blah import *
b = Blah()
cb = c_Blah()
%timeit -n 1000 b.op()
%timeit -n 1000 cb.op()
that is cumbersome because I had to rename all class-attr/method accesses.
Also this does not solve my dilemma once I go down the hierarchy.
Any other ideas how to approach this ?
I mean incremental recoding in Cython.
cswiercz makes good point :
from blah import Blah
from cblah import Blah as cBlah
this is OK, but supporting hierarchy will require modifications.
Will keep it open for other ideas.
You can use the import x as y method for renaming definitions within modules. For example, if foo.py contains a function func and bar.pyx also contains the function func (perhaps this is you trying to write a Cython version of foo.func()) then you can do the following in your timing script:
# separate imports
from foo import func as func_py
from bar import func as func_cy
# use the code
func_py(2.0)
func_cy(2.0)
This way you can keep a meaningful naming scheme within the foo and bar modules.
Related
I am working in a Google Colab notebook. There is one particular, computationally intensive piece of code that I'm doing using Cython in the same notebook. Within this piece of code, I want to call a function (defined in another cell in the same notebook, in Python).
Now, that function is heavily integrated with the rest of my pure Python code and rewriting and redefining things for Cython would not be possible.
My question is: How do I call that function written in Python, from another cell that is getting compiled in Cython?
Link I have already looked at:
Call python file with python imports from C using cython
Normally, you would put the whole functionality into a module and import it in the %%cython-cell.
Another less clean (but in case of a notebook probably acceptable) way would be to import from __main__, e.g.:
[1]: def foo():
print("I'm main foo")
and then:
[2]: %%cython
def usefoo():
from __main__ import foo
foo()
and now:
[3]: usefoo()
I'm main foo
Another variant would be to import foo from __main__ slightly differently:
[2]: %%cython
from __main__ import foo
def usefoo2():
foo()
There are two main differences:
if foo isn't (yet) defined in __main__, second %%cython-cell will fail. First version will fail if foo is not or no longer defined during the call of the function usefoo.
if foo is changed in __main__, the first version will use the current version while the second version will always use the version from the moment %%cython-cell built (which might not be the same time the %%cython-cell is run due to caching). This can be quite confusing.
In the long run, this way is quite confusing and puzzling, so after short try-out phase I would change to a more sustainable approach using dedicated modules.
Goal
I want to be able to import (on the __init__.py) all functions from every single file inside my package.
Usage
For example in this folder structure.
manage.py
- scripts/
-- __init__.py
-- tests.py
-- deploy.py
I am currently doing the following:
manage.py:
from scripts import *
script/init.py:
from .tests import *
from .deploy import *
But, every time I add another file to the package I have to add an import line on script/__init__.py, which is kind of annoying.
You can do it, manually, but you shouldn't.
Why you really do not want to do this:
You'll end up with a namespace where understanding what is what and from where it came from will be extremely hard, with difficulty increasing as the size of the overall project does. Appart from being completely unintuitive for Python, think of anybody else that might view your code or even worse, think about yourself re-reading it after 1 month and not remembering what's going on. You don't need that in your life.
In addition to that, any functions you expose to the importer that might overlap with other functions in other modules are going to get shaddowed by the most recent one imported. As an example, think of two scripts that contain the same function foo() and watch what happens.
>>> from scrpt1 import *
>>> foo()
Script 1
>>> from scrpt2 import *
>>> foo()
Script 2
Don't need that in your life either. Especially when it is so easy to bypass by being explicit.
Here are some related lines from the text contained in import this:
Explicit is better than implicit.
Be explicit about the place where your functions are defined in. Don't "spaghetti" your code. You'll want to hit yourself in the future if you opt in for a mesh of all stuff in one place.
Special cases aren't special enough to break the rules.
Really self explanatory.
Namespaces are one honking great idea -- let's do more of those!
"more of those!", not less; don't miss out on how wonderful namespaces are. Python is based on them; segregating your code in different namespaces is the foundation of organizing code.
importlib allows you to import any Python module from a string name. You can automate it with going through the list of files in the path.
It's more pythonic to use __all__. Check here for more details.
I build quite complex python apps, often with Django. To simplify inter-application interfaces I sometimes use service.py modules that abstract away from the models.
As these 'aggregate functionality', they frequently end up with circular imports which are easily eliminated by placing the import statements inside the service functions.
Is there a significant performance or memory cost associated with generally moving imports as close to their point of use as possible? For example, if I only use a particular imported name in one function in a file, it seems natural to place the import in that particular function rather than at the top of the file in its conventional place.
This issue is subtly different to this question because each import is in the function namespace.
The point at which you import a module is not expected to cause a performance penalty, if that's what you're worried about. Modules are singletons and will not be imported every single time an import statement is encountered. However, how you do the import, and subsequent attribute lookups, does have an impact.
For example, if you import math and then every time you need to use the sin(...) function you have to do math.sin(...), this will generally be slower than doing from math import sin and using sin(...) directly as the system does not have to keep looking up the function name within the module.
This lookup-penalty applies to anything that is accessed using the dot . and will be particularly noticeable in a loop. It's therefore advisable to get a local reference to something you might need to use/invoke frequently in a performance critical loop/section.
For example, using the original import math example, right before a critical loop, you could do something like this:
# ... within some function
sin = math.sin
for i in range(0, REALLY_BIG_NUMBER):
x = sin(i) # faster than: x = math.sin(x)
# ...
This is a trivial example, but note that you could do something similar with methods on other objects (e.g. lists, dictionaries, etc).
I'm probably a bit more concerned about the circular imports you mention. If your intention is to "fix" circular imports by moving the import statements into more "local" places (e.g. within a specific function, or block of code, etc) you probably have a deeper issue that you need to address.
Personally, I'd keep the imports at the top of the module as it's normally done. Straying away from that pattern for no good reason is likely to make your code more difficult to go through because the dependencies of your module will not be immediately apparent (i.e. there're import statements scattered throughout the code instead of in a single location).
It might also make the circular dependency issue you seem to be having more difficult to debug and easier to fall into. After all, if the module is not listed above, someone might happily think your module A has no dependency on module B and then up adding an import A in B when A already has import B hidden in some deep dark corner.
Benchmark Sample
Here's a benchmark using the lookup notation:
>>> timeit('for i in range(0, 10000): x = math.sin(i)', setup='import math', number=50000)
89.7203312900001
And another benchmark not using the lookup notation:
>>> timeit('for i in range(0, 10000): x = sin(i)', setup='from math import sin', number=50000)
78.27029322999988
Here there's a 10+ second difference.
Note that your gain depends on how much time the program spends running this code --i.e. a performance critical section instead of sporadic function calls.
See this question.
Basically whenever you import a module, if it's been imported before it will use a cached value.
This means that the performance will be hit the first time that the module is loaded, but once it's been loaded it will cache the values for future calls to it.
As ray said, importing specific functions is (slightly faster)
1.62852311134 for sin()
1.89815092087 for math.sin()
using the following code
from time import time
sin=math.sin
t1=time()
for i in xrange(10000000):
x=sin(i)
t2=time()
for i in xrange(10000000):
z=math.sin(i)
t3=time()
print (t2-t1)
print (t3-t2)
As per timeit, there is a significant cost to an import statement, even when the module is already imported in the same namespace:
$ python -m timeit -s 'import sys
def foo():
import sys
assert sys is not None
' -- 'foo()'
500000 loops, best of 5: 824 nsec per loop
$ python -m timeit -s 'import sys
def foo():
assert sys is not None
' -- 'foo()'
2000000 loops, best of 5: 96.3 nsec per loop
(Timing figures from Python 3.10.6 on Termux running on a phone.)
Instead of imports within functions, I've found that I can take advantage of Python's support for partially initialized modules and do a "tail import", pushing the import statement to the very bottom of the file (with a # isort:skip to get isort to leave it alone). This allows circular imports as long as the tail import is not required at module or class level and only at function or method level.
I have several compiled python modules; they are put into a single .so (to avoid runtime linking, there are cross-module symbol dependencies), but a number of symlinks points to this .so:
foo.so -> liball.so
bar.so -> liball.so
liball.so
This way, I can do import foo (Python will call initfoo() defined in liball.so) or import bar (calls initbar()).
I am wondering if this approach will work on Windows?
Probably not, but you could achieve your goal with
import sys
import liball
sys.modules['foo'] = liball
sys.modules['bar'] = liball
if you need to import them at several places, or with
import liball as foo, libalb as bar, liball
if you need that only at one place.
It might be, however, that the distinction between initfoo() and initbar() cannot be held and that both must be done so that the module effectively contains everything to be contained in both modules.
If foo partially contains the same symbols as bar, but with a different meaning, this approach won't work. But then you can just copy the file. This will occupy more disk space than needed, but that doesn't hurt so much, IMHO.
Suppose you have the following
b
b/__init__.py
b/c
b/c/__init__.py
b/c/d
b/c/d/__init__.py
In some python packages, if you import b, you only get the symbols defined in b. To access b.c, you have to explicitly import b.c or from b import c. In other words, you have to
import b
import b.c
import b.c.d
print b.c.d
In other cases I saw an automatic import of all the subpackages. This means that the following code does not produce an error
import b
print b.c.d
because b/__init__.py takes care of importing its subpackages.
I tend to prefer the first (explicit better than implicit), and I always used it, but are there cases where the second one is preferred to the first?
I like namespaces -- so I think that import b should only get what's in b itself (presumably in b/__init__.py). If there's a reason to segregate other functionality in b.c, b.c.d, or whatever, then just import b should not drag it all in -- if the "drag it all in" does happen, I think that suggests that the namespace separation was probably a bogus one to start with. Of course, there are examples even in the standard library (import os, then you can use os.path.join and the like), but they're ancient, by now essentially "grandfathered" things from way before the Python packaging system was mature and stable. In new code, I'd strongly recommend that a package should not drag its subpackages along for the ride when you import it. (Do import this at the Python prompt and contemplate the very last line it shows;-).
__all__ = [your vars, functions, classes]
Use syntax above in package b's __init__.py to auto load things listed in dict. :)