Best practice for lazy loading Python modules - python

Occasionally I want lazy module loading in Python. Usually because I want to keep runtime requirements or start-up times low and splitting the code into sub-modules would be cumbersome. A typical use case and my currently preferred implementation is this:
jinja2 = None
class Handler(...):
...
def render_with_jinja2(self, values, template_name):
global jinja2
if not jinja2:
import jinja2
env = jinja2.Environment(...)
...
I wonder: is there a canonical/better way to implement lazy module loading?

There's no reason for you to keep track of imports manually -- the VM maintains a list of modules that have already been imported, and any subsequent attempts to import that module result in a quick dict lookup in sys.modules and nothing else.
The difference between your code and
def render_with_jinja2(self, values, template_name):
import jinja2
env = jinja2.Environment(...)
is zero -- when we hit that code, if jinja2 hasn't been imported, it is imported then. If it already has been, execution continues on.

class Handler(...):
...
def render_with_jinja2(self, values, template_name):
import jinja2
env = jinja2.Environment(...)
...
There's no need to cache the imported module; Python does that already.

The other answers have covered the actual details but if you are interested in a lazy loading library, check out apipkg which is part of the py package (py.test fame).

Nice pattern from sqlalchemy: dependency injection:
#util.dependencies("sqlalchemy.orm.query")
def merge_result(query, *args):
#...
query.Query(...)
Instead of declaring all "import" statements at the top of the module, it will only import a module when it's actually needed by a function.
This can resolve circular dependency problems.

Related

How to define variables in `__init__.py` , so that when module in its package is imported, they're available to modules of another package? [duplicate]

I want to define a constant that should be available in all of the submodules of a package. I've thought that the best place would be in in the __init__.py file of the root package. But I don't know how to do this. Suppose I have a few subpackages and each with several modules. How can I access that variable from these modules?
Of course, if this is totally wrong, and there is a better alternative, I'd like to know it.
You should be able to put them in __init__.py. This is done all the time.
mypackage/__init__.py:
MY_CONSTANT = 42
mypackage/mymodule.py:
from mypackage import MY_CONSTANT
print "my constant is", MY_CONSTANT
Then, import mymodule:
>>> from mypackage import mymodule
my constant is 42
Still, if you do have constants, it would be reasonable (best practices, probably) to put them in a separate module (constants.py, config.py, ...) and then if you want them in the package namespace, import them.
mypackage/__init__.py:
from mypackage.constants import *
Still, this doesn't automatically include the constants in the namespaces of the package modules. Each of the modules in the package will still have to import constants explicitly either from mypackage or from mypackage.constants.
You cannot do that. You will have to explicitely import your constants into each individual module's namespace. The best way to achieve this is to define your constants in a "config" module and import it everywhere you require it:
# mypackage/config.py
MY_CONST = 17
# mypackage/main.py
from mypackage.config import *
You can define global variables from anywhere, but it is a really bad idea. import the __builtin__ module and modify or add attributes to this modules, and suddenly you have new builtin constants or functions. In fact, when my application installs gettext, I get the _() function in all my modules, without importing anything. So this is possible, but of course only for Application-type projects, not for reusable packages or modules.
And I guess no one would recommend this practice anyway. What's wrong with a namespace? Said application has the version module, so that I have "global" variables available like version.VERSION, version.PACKAGE_NAME etc.
Just wanted to add that constants can be employed using a config.ini file and parsed in the script using the configparser library. This way you could have constants for multiple circumstances. For instance if you had parameter constants for two separate url requests just label them like so:
mymodule/config.ini
[request0]
conn = 'admin#localhost'
pass = 'admin'
...
[request1]
conn = 'barney#localhost'
pass = 'dinosaur'
...
I found the documentation on the Python website very helpful. I am not sure if there are any differences between Python 2 and 3 so here are the links to both:
For Python 3: https://docs.python.org/3/library/configparser.html#module-configparser
For Python 2: https://docs.python.org/2/library/configparser.html#module-configparser

Making util file not accessible in python

I am building a python library. The functions I want available for users are in stemmer.py. Stemmer.py uses stemmerutil.py
I was wondering whether there is a way to make stemmerutil.py not accessible to users.
If you want to hide implementation details from your users, there are two routes that you can go. The first uses conventions to signal what is and isn't part of the public API, and the other is a hack.
The convention for declaring an API within a python library is to add all classes/functions/names that should be exposed into an __all__-list of the topmost __init__.py. It doesn't do that many useful things, its main purpose nowadays is a symbolic "please use this and nothing else". Yours would probably look somewhat like this:
urdu/urdu/__init__.py
from urdu.stemmer import Foo, Bar, Baz
__all__ = [Foo, Bar, Baz]
To emphasize the point, you can also give all definitions within stemmerUtil.py an underscore before their name, e.g. def privateFunc(): ... becomes def _privateFunc(): ...
But you can also just hide the code from the interpreter by making it a resource instead of a module within the package and loading it dynamically. This is a hack, and probably a bad idea, but it is technically possible.
First, you rename stemmerUtil.py to just stemmerUtil - now it is no longer a python module and can't be imported with the import keyword. Next, update this line in stemmer.py
import stemmerUtil
with
import importlib.util
import importlib.resources
# in python3.7 and lower, this is importlib_resources and needs to be installed first
stemmer_util_spec = importlib.util.spec_from_loader("stemmerUtil", loader=None)
stemmerUtil = importlib.util.module_from_spec(stemmer_util_spec)
with importlib.resources.path("urdu", "stemmerUtil") as stemmer_util_path:
with open(stemmer_util_path) as stemmer_util_file:
stemmer_util_code = stemmer_util_file.read()
exec(stemmer_util_code, stemmerUtil.__dict__)
After running this code, you can use the stemmerUtil module as if you had imported it, but it is invisible to anyone who installed your package - unless they run this exact code as well.
But as I said, if you just want to communicate to your users which part of your package is the public API, the first solution is vastly preferable.

Raise a custom exception on import

Short question: I have a module with objects. How can I do that if someone imports an object from my module, my specified exception is raised?
What I want to do: I write an architectural framework. A class for output depends on jinja2 external library. I want the framework to be usable without this dependency as well. In the package's __init__.py I write conditional import of my class RenderLaTeX (if jinja2 is available, the class is imported, otherwise not).
The problem with this approach is that I have some code which uses this class RenderLaTeX, but when I run it on a fresh setup, I receive an error like Import error: no class RenderLaTeX could be imported from output. This error is pretty unexpected and ununderstandable before I recall that jinja2 must be installed beforehand.
I thought about this solution: if the class is not available, __init__.py can create a string with this name. If a user tries to instantiate this object with the usual class constructor, they'll get a more meaningful error. Unfortunately, simple import
from output import RenderLaTeX
won't raise an error in this case.
What would you suggest, hopefully with the description of benefits and drawbacks?
Important UPD: I package my classes in modules and import them to the module via __init__.py, so that I import 'from lena.flow import ReadROOTFile', rather than 'from lena.flow.read_root_file import ReadROOTFile.'
When Python imports a module all of the code inside the file from which you are importing is executed.
If your RenderLaTeX class is therefore placed into a seperate file you can freely add logic which would prevent it from being imported (or run) if required dependencies are missing.
For example:
try:
import somethingidonthave
except ImportError:
raise Exception('You need this module!')
class RenderLaTeX(object):
pass
You can also add any custom message you want to the exception to better describe the error. This should work in both Python2 and Python3.
After a year of thinking, the solution appeared.
First of all, I think that this is pretty meaningless to overwrite an exception's type. The only good way would be to add a useful message for a missing import.
Second, I think that the syntax
from framework.renderers import MyRenderer
is really better than
from framework.renderers.my_renderer import MyRenderer
because it hides implementation details and requires less code from user (I updated my question to reflect that). For the former syntax to work, I have to import MyRenderer in __init__.py in the module.
Now, in my_renderer.py I would usually import third-party packages with
import huge_specific_library
in the header. This syntax is required by PEP 8. However, this would make the whole framework.renderers module depend on huge_specific_library.
The solution for that is to violate PEP 8 and import the library inside the class itself:
class MyRenderer():
def __init__(self):
import huge_specific_library
# ... use that...
Here you can catch the exception if that is important, change its message, etc. There is another benefit for that: there exist guides how to reduce import time, and they propose this solution (I read them a long time ago and forgot). Large modules require some time to be loaded. If you follow PEP 8 Style Guide (I still think that you usually should), this may lead to large delays just to make all imports to your program, not having done anything useful yet.
The only caveat is this: if you import the library in __init__, you should also import that to other class methods that use it, otherwise it won't be visible there.
For those who still doubt, I must add that since Python imports are cached, this doesn't affect performance if your method that uses import is not called too often.

python: is there a disadvantage to from package import * besides namespace collision

I'm creating a class to extend a package, and prior to class instantiation I don't know which subset of the package's namespace I need. I've been careful about avoiding namespace conflicts in my code, so, does
from package import *
create problems besides name conflicts?
Is it better to examine the class's input and import only the names I need (at runtime) in the __init__ ??
Can python import from a set [] ?
does
for name in [namespace,namespace]:
from package import name
make any sense?
I hope this question doesn't seem like unnecessary hand-ringing, i'm just super new to python and don't want to do the one thing every 'beginnger's guide' says not to do (from pkg import * ) unless I'm sure there's no alternative.
thoughts, advice welcome.
In order:
It does not create other problems - however, name conflicts can be much more of a problem than you'd expect.
Definitely defer your imports if you can. Even though Python variable scoping is simplistic, you also gain the benefit of not having to import the module if the functionality that needs it never gets called.
I don't know what you mean. Square brackets are used to make lists, not sets. You can import multiple names from a module in one line - just use a comma-delimited list:
from awesome_module import spam, ham, eggs, baked_beans
# awesome_module defines lots of other names, but they aren't pulled in.
No, that won't do what you want - name is an identifier, and as such, each time through the loop the code will attempt to import the name name, and not the name that corresponds to the string referred to by the name variable.
However, you can get this kind of "dynamic import" effect, using the __import__ function. Consult the documentation for more information, and make sure you have a real reason for using it first. We get into some pretty advanced uses of the language here pretty quickly, and it usually isn't as necessary as it first appears. Don't get too clever. We hates them tricksy hobbitses.
When importing * you get everything in the module dumped straight into your namespace. This is not always a good thing as you could accentually overwrite something like;
from time import *
sleep = None
This would render the time.sleep function useless...
The other way of taking functions, variables and classes from a module would be saying
from time import sleep
This is a nicer way but often the best way is to just import the module and reference the module directly like
import time
time.sleep(3)
you can import like from PIL import Image, ImageDraw
what is imported by from x import * is limited to the list __all__ in x if it exists
importing at runtime if the module name isn't know or fixed in the code must be done with __import__ but you shouldn't have to do that
This syntax constructions help you to avoid any name collision:
from package import somename as another_name
import package as another_package_name

Can I use __init__.py to define global variables?

I want to define a constant that should be available in all of the submodules of a package. I've thought that the best place would be in in the __init__.py file of the root package. But I don't know how to do this. Suppose I have a few subpackages and each with several modules. How can I access that variable from these modules?
Of course, if this is totally wrong, and there is a better alternative, I'd like to know it.
You should be able to put them in __init__.py. This is done all the time.
mypackage/__init__.py:
MY_CONSTANT = 42
mypackage/mymodule.py:
from mypackage import MY_CONSTANT
print "my constant is", MY_CONSTANT
Then, import mymodule:
>>> from mypackage import mymodule
my constant is 42
Still, if you do have constants, it would be reasonable (best practices, probably) to put them in a separate module (constants.py, config.py, ...) and then if you want them in the package namespace, import them.
mypackage/__init__.py:
from mypackage.constants import *
Still, this doesn't automatically include the constants in the namespaces of the package modules. Each of the modules in the package will still have to import constants explicitly either from mypackage or from mypackage.constants.
You cannot do that. You will have to explicitely import your constants into each individual module's namespace. The best way to achieve this is to define your constants in a "config" module and import it everywhere you require it:
# mypackage/config.py
MY_CONST = 17
# mypackage/main.py
from mypackage.config import *
You can define global variables from anywhere, but it is a really bad idea. import the __builtin__ module and modify or add attributes to this modules, and suddenly you have new builtin constants or functions. In fact, when my application installs gettext, I get the _() function in all my modules, without importing anything. So this is possible, but of course only for Application-type projects, not for reusable packages or modules.
And I guess no one would recommend this practice anyway. What's wrong with a namespace? Said application has the version module, so that I have "global" variables available like version.VERSION, version.PACKAGE_NAME etc.
Just wanted to add that constants can be employed using a config.ini file and parsed in the script using the configparser library. This way you could have constants for multiple circumstances. For instance if you had parameter constants for two separate url requests just label them like so:
mymodule/config.ini
[request0]
conn = 'admin#localhost'
pass = 'admin'
...
[request1]
conn = 'barney#localhost'
pass = 'dinosaur'
...
I found the documentation on the Python website very helpful. I am not sure if there are any differences between Python 2 and 3 so here are the links to both:
For Python 3: https://docs.python.org/3/library/configparser.html#module-configparser
For Python 2: https://docs.python.org/2/library/configparser.html#module-configparser

Categories