Which style is preferable?
Style A:
def foo():
import some_module
some_module.something
Style B:
import some_module
def foo():
some_module.something
Assume that some_module is not used elsewhere in the code, only inside this function.
Indeed, as already noted, it's usually best to follow the PEP 8 recommendation and do your imports at the top. There are some exceptions though. The key to understanding them lies in your embedded question in your second paragraph: "at what stage does the import ... happen?"
Import is actually an executable statement. When you import a module, all the executable statements in the module run. "def" is also an executable statement; its execution causes the defined name to be associated with the (already-compiled) code. So if you have:
def f():
import something
return None
in a module that you import, the (compiled) import and return statements get associated with the name "f" at that point. When you run f(), the import statement there runs.
If you defer importing something that is "very big" or "heavy", and then you never run the function (in this case f), the import never happens. This saves time (and some space as well). Of course, once you actually call f(), the import happens (if it has already happened once Python uses the cached result, but it still has to check) so you lose your time advantage.
Hence, as a rule of thumb, "import everything at the top" until after you have done a lot of profiling and discovered that importing "hugething" is wasting a lot of time in 90% of your runs, vs saving a little time in 10% of them.
PEP 8 recommends that all imports happen at the top of the module. All names, including those bound to modules, are searched for in the local, non-local, global, and built-in scopes, in that order.
Related
I'd like to dynamically create a module from a dictionary, and I'm wondering if adding an element to sys.modules is really the best way to do this. EG
context = { a: 1, b: 2 }
import types
test_context_module = types.ModuleType('TestContext', 'Module created to provide a context for tests')
test_context_module.__dict__.update(context)
import sys
sys.modules['TestContext'] = test_context_module
My immediate goal in this regard is to be able to provide a context for timing test execution:
import timeit
timeit.Timer('a + b', 'from TestContext import *')
It seems that there are other ways to do this, since the Timer constructor takes objects as well as strings. I'm still interested in learning how to do this though, since a) it has other potential applications; and b) I'm not sure exactly how to use objects with the Timer constructor; doing so may prove to be less appropriate than this approach in some circumstances.
EDITS/REVELATIONS/PHOOEYS/EUREKA:
I've realized that the example code relating to running timing tests won't actually work, because import * only works at the module level, and the context in which that statement is executed is that of a function in the testit module. In other words, the globals dictionary used when executing that code is that of __main__, since that's where I was when I wrote the code in the interactive shell. So that rationale for figuring this out is a bit botched, but it's still a valid question.
I've discovered that the code run in the first set of examples has the undesirable effect that the namespace in which the newly created module's code executes is that of the module in which it was declared, not its own module. This is like way weird, and could lead to all sorts of unexpected rattlesnakeic sketchiness. So I'm pretty sure that this is not how this sort of thing is meant to be done, if it is in fact something that the Guido doth shine upon.
The similar-but-subtly-different case of dynamically loading a module from a file that is not in python's include path is quite easily accomplished using imp.load_source('NewModuleName', 'path/to/module/module_to_load.py'). This does load the module into sys.modules. However this doesn't really answer my question, because really, what if you're running python on an embedded platform with no filesystem?
I'm battling a considerable case of information overload at the moment, so I could be mistaken, but there doesn't seem to be anything in the imp module that's capable of this.
But the question, essentially, at this point is how to set the global (ie module) context for an object. Maybe I should ask that more specifically? And at a larger scope, how to get Python to do this while shoehorning objects into a given module?
Hmm, well one thing I can tell you is that the timeit function actually executes its code using the module's global variables. So in your example, you could write
import timeit
timeit.a = 1
timeit.b = 2
timeit.Timer('a + b').timeit()
and it would work. But that doesn't address your more general problem of defining a module dynamically.
Regarding the module definition problem, it's definitely possible and I think you've stumbled on to pretty much the best way to do it. For reference, the gist of what goes on when Python imports a module is basically the following:
module = imp.new_module(name)
execfile(file, module.__dict__)
That's kind of the same thing you do, except that you load the contents of the module from an existing dictionary instead of a file. (I don't know of any difference between types.ModuleType and imp.new_module other than the docstring, so you can probably use them interchangeably) What you're doing is somewhat akin to writing your own importer, and when you do that, you can certainly expect to mess with sys.modules.
As an aside, even if your import * thing was legal within a function, you might still have problems because oddly enough, the statement you pass to the Timer doesn't seem to recognize its own local variables. I invoked a bit of Python voodoo by the name of extract_context() (it's a function I wrote) to set a and b at the local scope and ran
print timeit.Timer('print locals(); a + b', 'sys.modules["__main__"].extract_context()').timeit()
Sure enough, the printout of locals() included a and b:
{'a': 1, 'b': 2, '_timer': <built-in function time>, '_it': repeat(None, 999999), '_t0': 1277378305.3572791, '_i': None}
but it still complained NameError: global name 'a' is not defined. Weird.
I'm writing some code for an esp8266 micro controller using micro-python and it has some different class as well as some additional methods in the standard built in classes. To allow me to debug on my desktop I've built some helper classes so that the code will run. However I've run into a snag with micro-pythons time function which has a time.sleep_ms method since the standard time.sleep method on micropython does not accept floats. I tried using the following code to extend the built in time class but it fails to import properly. Any thoughts?
class time(time):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def sleep_ms(self, ms):
super().sleep(ms/1000)
This code exists in a file time.py. Secondly I know I'll have issues with having to import time.time that I would like to fix. I also realize I could call this something else and put traps for it in my micro controller code however I would like to avoid any special functions in what's loaded into the controller to save space and cycles.
You're not trying to override a class, you're trying to monkey-patch a module.
First off, if your module is named time.py, it will never be loaded in preference to the built-in time module. Truly built-in (as in compiled into the interpreter core, not just C extension modules that ship with CPython) modules are special, they are always loaded without checking sys.path, so you can't even attempt to shadow the time module, even if you wanted to (you generally don't, and doing so is incredibly ugly). In this case, the built-in time module shadows you; you can't import your module under the plain name time at all, because the built-in will be found without even looking at sys.path.
Secondly, assuming you use a different name and import it for the sole purpose of monkey-patching time (or do something terrible like adding the monkey patch to a custom sitecustomize module, it's not trivial to make the function truly native to the monkey-patched module (defining it in any normal way gives it a scope of the module where it was defined, not the same scope as other functions from the time module). If you don't need it to be "truly" defined as part of time, the simplest approach is just:
import time
def sleep_ms(ms):
return time.sleep(ms / 1000)
time.sleep_ms = sleep_ms
Of course, as mentioned, sleep_ms is still part of your module, and carries your module's scope around with it (that's why you do time.sleep, not just sleep; you could do from time import sleep to avoid qualifying it, but it's still a local alias that might not match time.sleep if someone else monkey-patches time.sleep later).
If you want to make it behave like it's part of the time module, so you can reference arbitrary things in time's namespace without qualification and always see the current function in time, you need to use eval to compile your code in time's scope:
import time
# Compile a string of the function's source to a code object that's not
# attached to any scope at all
# The filename argument is garbage, it's just for exception traceback
# reporting and the like
code = compile('def sleep_ms(ms): sleep(ms / 1000)', 'time.py', 'exec')
# eval the compiled code with a scope of the globals of the time module
# which both binds it to the time module's scope, and inserts the newly
# defined function directly into the time module's globals without
# defining it in your own module at all
eval(code, vars(time))
del code, time # May as well leave your monkey-patch module completely empty
Which style is preferable?
Style A:
def foo():
import some_module
some_module.something
Style B:
import some_module
def foo():
some_module.something
Assume that some_module is not used elsewhere in the code, only inside this function.
Indeed, as already noted, it's usually best to follow the PEP 8 recommendation and do your imports at the top. There are some exceptions though. The key to understanding them lies in your embedded question in your second paragraph: "at what stage does the import ... happen?"
Import is actually an executable statement. When you import a module, all the executable statements in the module run. "def" is also an executable statement; its execution causes the defined name to be associated with the (already-compiled) code. So if you have:
def f():
import something
return None
in a module that you import, the (compiled) import and return statements get associated with the name "f" at that point. When you run f(), the import statement there runs.
If you defer importing something that is "very big" or "heavy", and then you never run the function (in this case f), the import never happens. This saves time (and some space as well). Of course, once you actually call f(), the import happens (if it has already happened once Python uses the cached result, but it still has to check) so you lose your time advantage.
Hence, as a rule of thumb, "import everything at the top" until after you have done a lot of profiling and discovered that importing "hugething" is wasting a lot of time in 90% of your runs, vs saving a little time in 10% of them.
PEP 8 recommends that all imports happen at the top of the module. All names, including those bound to modules, are searched for in the local, non-local, global, and built-in scopes, in that order.
When using many IDEs that support autocompletion with Python, things like this will show warnings, which I find annoying:
from eventlet.green.httplib import BadStatusLine
When switching to:
from eventlet.green.httplib import *
The warnings go away. What's the benefit to limiting imports to a specific set of types you'll use? Is the parsing faster? Reduces collisions? What other point is there? It seems the state of python IDEs and the nature of the typing system makes it hard for many IDEs to fully get right when a type import works and when it doesn't.
By typing from foo import *, you import all the names defined in foo into the global namespace. This is bad practice because you could have name clashes both with other modules and with built-ins.
For example, consider a module foo
#foo.py
def open(something):
pass
and a module bar:
#bar.py
def open(something_else):
pass
Now, from foo import * hides the built-in function open() which means that any calls to open() now refer to foo.open() rather than the built-in. Worse, if you then have from bar import *, the function open() in bar now hides both the built-in and the function imported from foo.
In the example above, from foo import open is equally shadowing the built-in function, but one glance at the code tells you why you can't open files for IO anymore.
This is why you should import only specific names, ensuring that you know what names are imported. Alternatively, you could use fully qualified names (import foo; foo.open(), which is perfectly safe).
EDIT: Just as a note, this can be horribly compounded if the module you're importing also uses from x import *. In this case, not only do you typically import all the stuff in the module foo, but also all the stuff in the module x into the global namespace. This can very quickly turn into an absolute mess.
It reduces collisions with user-defined types, it reduces coupling and it's self-documenting, since it makes clear from the outset of the module which classes are coming from libraries (so the rest must be user-defined). The parsing is not faster, at least not in CPython: an imported module must be read in its entirety to look for the classes/functions being imported.
(I must admit that I never use an IDE.)
Let's assume I have a main script, main.py, that imports another python file with import coolfunctions and another: import chores
Now, suppose coolfunctions also uses stuff from chores, hence I declare import chores inside coolfunctions.
Since both main.py, and coolfunctions import chores ~ is this redundant? Is there any other way of doing this? Am I doing it correctly?
I'm confused about how python projects should be structured in general. I have a "conf.py" file, that I import for a bunch of variables ~ is this a module or not? I load this conf file in multiple places as well.
If two modules want to use chores, then each one must import chores (or some equivalent import). Each import creates a name binding only in the namespace of the module that does the import; that is, import's namespace effect is local to a module's namespace.
This is good, because by looking at a module's code you can (barring pathological cases) know where each name is bound to by the import statements that explicitly bind modules or module attributes to names. Imports made in other modules won't affect this module's namespace.
Each module X should import all (and only) the modules Y, Z, T, ... whose functionality it requires, without any worry about what other modules Fee, Fie, Foo ... (if any) may have already done part or all of those imports, or may be going to do so in the future.
It would make a module extremely fragile (indeed, it would be the very opposite of modularity!) if each module had to worry about such subtle, "covert-channel" effects.
What other modules Y, Z, T, ..., each module X chooses to import (if any) is part of X's implementation details, and shouldn't concern anybody except the developers who are coding, testing, or maintaining X.
In order to ensure that this is the case, and that this clearly-best strategy of decoupling can and will fully be followed by sane code, Python "caches" modules as they get imported: a module is "loaded" only once per run of a program, the first time anybody imports it (or anything from inside it) -- all other imports use the same object obtained by that first loading, which Python keeps in a cache (which is specified as being the dict sys.modules, but you need to know that detail only for somewhat-advanced programming techniques... don't worry about it, 98.7% of the time -- just remember that "import is cheap"!-).
Sure, a conf.py that you use from several other modules via import conf is definitely a module (you may think you're loading it multiple times, but you aren't unless you're using pretty advanced and deliberate techniques indeed for the purpose) -- why shouldn't it be?
No, this isn't redundant - it's fine to import chores in both the main module and coolfunctions.
The exact import mechanics of Python are complex (for example, module imports are only done once, meaning in your case that the actual parsing and loading of the chores module will only happen once, which is a nice optimization) but in general you shouldn't worry about it because it just works.
Each Python file is a module, so your conf.py is also a module.
It is always the best practice to import all necessary modules in the file that uses them. Take for example:
A.py contains: import coolfunctions
B.py contains: import A
Main.py contains: import B and uses functions that are defined in A.py (this is possible because by importing B, Main.py has imported everything that B imports)
If in the future, you change B.py to function without needing to import A.py and therefore remove the import A, then your Main.py will suffer the loss of not having imported A.