I am trying to access the module docstring from within the same module it is defined. Practical example:
#!/usr/bin/env python3
"Module docstring that explains what the script does."
import argparse as ap
parser = ap.ArgumentParser(description=__doc__)
...
I have always used __doc__, but I later stumbled upon some code using sys.module[__name__].__doc__. The two objects appear to be the same, but I am wondering if they are always the same. In other words, is the following:
>>> sys.module[__name__].__doc__ is __doc__
True
always true? Can I safely use __doc__ in my example instead of sys.modules[__name__].__doc__?
In practice they will be the same unless you go out of your way to make them different.
If you assign to __name__ then you can make the expression refer to a different module:
"""My doc string"""
import sys
print(__doc__)
__name__='sys'
print(sys.modules[__name__].__doc__)
will print two different doc strings.
Alternatively, you could leave __name__ alone but delete your module from sys.modules and then import another module with the same __name__ (or in Python 2.x use reload() to reload the module after editing the docstring). If you held onto a reference to a function in the old module you could still call it and __doc__ will be the old value while sys.modules[__name__].__doc__ is the new one.
So they could be different, but only if you work at it.
Related
Below is a simple code example that may help to explain my question.
file_1.py
from functools import lru_cache
from file_2 import add_stuff, add_stats
#lru_cache()
def add(x, y):
return x + y
if __name__ == "__main__":
add(1, 2)
add(1, 2)
add(3, 4)
print(add.cache_info)
print(add.cache_info())
add_stuff(1, 2)
add_stuff(3, 4)
add_stats()
file_2.py
def add_stuff(x, y):
from file_1 import add
add(x, y)
def add_stats():
from file_1 import add
print(add.cache_info)
print(add.cache_info())
And the output looks like this:
<built-in method cache_info of functools._lru_cache_wrapper object at 0x017E9E48>
CacheInfo(hits=1, misses=2, maxsize=128, currsize=2)
<built-in method cache_info of functools._lru_cache_wrapper object at 0x017E9D40>
CacheInfo(hits=0, misses=2, maxsize=128, currsize=2)
When I use the function inside of the file it was defined in, the function object is different from when another file imports it. Which means that for things like lru_cache, if you didn't realize this, you could be populating two caches inside of your process/threads if you don't keep the cached functions inside of a different file from where they are used.
My question is, is this a python gotcha to look out for? Or is there documentation somewhere that I just never read that explains this more in depth? I looked at the lru_cache documentation, and this was not called out there as anything to be aware of.
When I use the function inside of the file it was defined in, the function object is different from when another file imports it.
Yes; there are two separate caches, because each is decorating a separate function object. The reason there are two separate function objects is because there are two separate modules created from the same source code.
One of these modules was created by from file_1 import add, which causes a module to be cached in the sys.modules with the key 'file_1' and a __name__ attribute of file_1. (Subsequent uses of import will look up this module in the cache).
The other one is created by running file_1.py as the main script. This causes a module with a __name__ attribute of __main__ to be created.
This is why and how the if __name__ == '__main__': trick works. The global variables available to a module - i.e., what you get by using globals() - come from attributes of the module object. Top-level scripts are also represented with module objects - they just aren't imported using import (although they are created using much of the same machinery, and cached; a '__main__' key will appear in sys.modules). That's where the information comes from, and thus why __name__ exists as a global variable in normal circumstances.
is this a python gotcha to look out for? Or is there documentation somewhere that I just never read that explains this more in depth? I looked at the lru_cache documentation, and this was not called out there as anything to be aware of.
It isn't explained in the lru_cache documentation because it isn't lru_cache's fault. It would happen with any decorator. In fact, it would happen with any code that makes the separate identity of the function objects relevant. For example, if we create this module_example.py:
def example():
print(example.module)
example.module = __name__
if __name__ == '__main__':
example()
(The reoccurrence of __name__ should make it obvious what is going on - although, of course, we could even just use __name__ directly in the function)
Now we test the code in interactive mode - run it, use the global function, and then import the module and use the imported function:
$ python -i module_example.py
__main__
>>> example()
__main__
>>> import module_example
>>> module_example.example()
module_example
>>> quit()
$
This is only a gotcha insofar as expecting a module to work as both the top-level code and as something importable, imposes some design considerations. Normally, if the code is intended to be imported, the "driver" code block (if any) will just do an informal test; or offer a simple, one-off UI for the module's functionality that doesn't care about consistency with an imported-module version of the same code.
Alternately put: the real problem here is a circular import. file_1 is indirectly importing itself to get at its own functionality, and it only "works" because of the implicit renaming of the module to __main__ the first time.
As a beginner, what I understood is that Python Standard Library (PSL) provides a lot of modules which provide a lot of functionalities, but still if I want to use those then I have to import the module, for example, sys, os etc. are PSL modules but still those need to be imported.
Now, I wonder if that is the case then how without importing anything I am able to use functions like print, list, len etc.? Is it that their "support is built-in into the interpreter"?
Yes. They're built-in functions (or in the case of list, a built-in class). You can explicitly import the __builtin__ module (Py2) or the builtins module (Py3) if you want qualified access to the names, but by default, those modules are searched whenever an attempt to access a global name doesn't find the name in the module globals. They're not normally needed though, per the docs:
This module is not normally accessed explicitly by most applications, but can be useful in modules that provide objects with the same name as a built-in value, but in which the built-in of that name is also needed.
The print function comes from the builtins module.
You can find its documentation here.
Here is an example session.
I first check what module print comes from,which is stored in its __module__ attribute.
Then, I import the builtins module, and checks if its print function is the same as the prefix-less print.
>>> print.__module__
'builtins'
>>> import builtins
>>> builtins.print("hello")
hello
>>> print is builtins.print
True
You should give the page on built-in functions a read
Quote:
The Python interpreter has a number of functions and types built into
it that are always available.
This is the contents of script_one.py:
x = "Hello World"
This is the contents of script_two.py:
from script_one import x
print(x)
Now, if I ran script_two.py the output would be:
>>> Hello World
What I need is a way to detect if x was imported.
This is what I imagine the source code of script_one.py would look like:
x = "Hello World"
if x.has_been_imported:
print("You've just imported \"x\"!")
Then if I ran script_two.py the output "should" be:
>>> Hello World
>>> You've just imported "x"!
What is this called, does this feature exist in Python 3 and how do you use it?
You can't. Effort expended on trying to detect this are a waste of time, I'm afraid.
Python imports consist of the following steps:
Check if the module is already loaded by looking at sys.modules.
If the module hasn't been loaded yet, load it. This creates a new module object that is added to sys.modules, containing all objects resulting from executing the top-level code.
Bind names in the importing namespace. How names are bound depends on the exact import variant chosen.
import module binds the name module to the sys.modules[module] object
import module as othername binds the name othername to the sys.modules[module] object
from module import attribute binds the name attribute to the sys.modules[module].attribute object
from module import attribute as othername binds the name othername to the sys.modules[module].attribute object
In this context it is important to realise that Python names are just references; all Python objects (including modules) live on a heap and stand or fall with the number of references to them. See this great article by Ned Batchelder on Python names if you need a primer on how this works.
Your question then can be interpreted in two ways:
You want to know the module has been imported. The moment code in the module is executed (like x = "Hello World"), it has been imported. All of it. Python doesn't load just x here, it's all or nothing.
You want to know if other code is using a specific name. You'd have to track what other references exist to the object. This is a mammoth task involving recursively checking the gc.get_referrers() object chain to see what other Python objects might now refer to x.
The latter goal is made the harder all the further in any of the following scenarios:
import script_one, then use script_one.x; references like these could be too short-lived for you to detect.
from script_one import x, then del x. Unless something else still references the same string object within the imported namespace, that reference is now gone and can't be detected anymore.
import sys; sys.modules['script_one'].x is a legitimate way of referencing the same string object, but does this count as an import?
import script_one, then list(vars(script_one).values()) would create a list of all objects defined in the module, but these references are indices in a list, not named. Does this count as an import?
Looks like it is impossible previously. But ever since python 3.7+ introduces __getattr__ on module level, looks like it is possible now. At least we can distinguish whether a variable is imported by from module import varable or import module; module.variable.
The idea is to detect the AST node in the previous frame, whether it is an Attribute:
script_one.py
def _variables():
# we have to define the variables
# so that it dosen't bypass __getattr__
return {'x': 'Hello world!'}
def __getattr__(name):
try:
out = _variables()[name]
except KeyError as kerr:
raise ImportError(kerr)
import ast, sys
from executing import Source
frame = sys._getframe(1)
node = Source.executing(frame).node
if node is None:
print('`x` is imported')
else:
print('`x` is accessed via `script_one.x`')
return out
script_two.py
from script_one import x
print(x)
# `x` is imported
# 'Hello world!'
import script_one
print(script_one.x)
# `x` is accessed via `script_one.x`
# 'Hello world!'
If I wanted to get the current module, e.g. to reload it, I would do:
import sys
sys.modules[__name__]
Is there a better way to do this (e.g. not involving __name__)? Better in this context means more idiomatic, more portable, more robust, or more...any of the other things we usually desire in our software.
I use python 2, but answers for python 3 will no doubt be useful to others.
There is no more idiomatic method to get the current module object from sys.modules than what you used.
__name__ is set by Python on import, essentially doing:
module_object = import_py_file(import_name)
module_object.__name__ = import_name
sys.modules[import_name] = module_object
so the __name__ reference is exactly what you want to use here.
I can make this code work, but I am still confused why it won't work the first way I tried.
I am practicing python because my thesis is going to be coded in it (doing some cool things with Arduino and PC interfaces). I'm trying to import a class from another file into my main program so that I can create objects. Both files are in the same directory. It's probably easier if you have a look at the code at this point.
#from ArduinoBot import *
#from ArduinoBot import ArduinoBot
import ArduinoBot
# Create ArduinoBot object
bot1 = ArduinoBot()
# Call toString inside bot1 object
bot1.toString()
input("Press enter to end.")
Here is the very basic ArduinoBot class
class ArduinoBot:
def toString(self):
print ("ArduinoBot toString")
Either of the first two commented out import statements will make this work, but not the last one, which to me seems the most intuitive and general. There's not a lot of code for stuff to go wrong here, it's a bit frustrating to be hitting these kind of finicky language specific quirks when I had heard some many good things about Python. Anyway I must be doing something wrong, but why doesn't the simple 'import ClassName' or 'import FileName' work?
Thank you for your help.
consider a file (example.py):
class foo(object):
pass
class bar(object):
pass
class example(object):
pass
Now in your main program, if you do:
import example
what should be imported from the file example.py? Just the class example? should the class foo come along too? The meaning would be too ambiguous if import module pulled the whole module's namespace directly into your current namespace.
The idea is that namespaces are wonderful. They let you know where the class/function/data came from. They also let you group related things together (or equivalently, they help you keep unrelated things separate!). A module sets up a namespace and you tell python exactly how you want to bring that namespace into the current context (namespace) by the way you use import.
from ... import * says -- bring everything in that module directly into my namespace.
from ... import ... as ... says, bring only the thing that I specify directly into my namespace, but give it a new name here.
Finally, import ... simply says bring that module into the current namespace, but keep it separate. This is the most common form in production code because of (at least) 2 reasons.
It prevents name clashes. You can have a local class named foo which won't conflict with the foo in example.py -- You get access to that via example.foo
It makes it easy to trace down which module a class came from for debugging.
consider:
from foo import *
from bar import *
a = AClass() #did this come from foo? bar? ... Hmmm...
In this case, to get access to the class example from example.py, you could also do:
import example
example_instance = example.example()
but you can also get foo:
foo_instance = example.foo()
The simple answer is that modules are things in Python. A module has its own status as a container for classes, functions, and other objects. When you do import ArduinoBot, you import the module. If you want things in that module -- classes, functions, etc. -- you have to explicitly say that you want them. You can either import them directly with from ArduinoBot import ..., or access them via the module with import ArduinoBot and then ArduinoBot.ArduinoBot.
Instead of working against this, you should leverage the container-ness of modules to allow you to group related stuff into a module. It may seem annoying when you only have one class in a file, but when you start putting multiple classes and functions in one file, you'll see that you don't actually want all that stuff being automatically imported when you do import module, because then everything from all modules would conflict with other things. The modules serve a useful function in separating different functionality.
For your example, the question you should ask yourself is: if the code is so simple and compact, why didn't you put it all in one file?
Import doesn't work quite the you think it does. It does work the way it is documented to work, so there's a very simple remedy for your problem, but nonetheless:
import ArduinoBot
This looks for a module (or package) on the import path, executes the module's code in a new namespace, and then binds the module object itself to the name ArduinoBot in the current namespace. This means a module global variable named ArduinoBot in the ArduinoBot module would now be accessible in the importing namespace as ArduinoBot.ArduinoBot.
from ArduinoBot import ArduinoBot
This loads and executes the module as above, but does not bind the module object to the name ArduinoBot. Instead, it looks for a module global variable ArduinoBot within the module, and binds whatever object that referred to the name ArduinoBot in the current namespace.
from ArduinoBot import *
Similarly to the above, this loads and executes a module without binding the module object to any name in the current namespace. It then looks for all module global variables, and binds them all to the same name in the current namespace.
This last form is very convenient for interactive work in the python shell, but generally considered bad style in actual development, because it's not clear what names it actually binds. Considering it imports everything global in the imported module, including any names that it imported at global scope, it very quickly becomes extremely difficult to know what names are in scope or where they came from if you use this style pervasively.
The module itself is an object. The last approach does in fact work, if you access your class as a member of the module. Either if the following will work, and either may be appropriate, depending on what else you need from the imported items:
from my_module import MyClass
foo = MyClass()
or
import my_module
foo = my_module.MyClass()
As mentioned in the comments, your module and class usually don't have the same name in python. That's more a Java thing, and can sometimes lead to a little confusion here.