main.py
#main.py
import main
print('Hello')
Output:
Hello
Hello
I believe that when it comes to the line import main, at that time, main is registered in sys.modules and hence the import statement of another script - which I believe, is not a part of __main__ - is not executed. Can someone please tell me whether I understand it correctly? If not, please give an explanation.
Let's add a little debugging output:
import sys
print([key for key in sys.modules.keys() if 'main' in key])
import main
It prints:
['__main__']
['__main__', 'main']
Why is that?
If you run a module it will not be added as its modules name to sys.modules. Instead it will always be __main__.
If you then import the module by its name (main). That name is not present in sys.modules and as the result the module will be imported again, its code executed and the modules stored in sys.modules under its name.
On executing main.py it will print ['__main__'] and on the re-import it will print both module names: ['__main__', 'main'].
This implies one rule: try not to import the module you are running anywhere in your code.
It only prints it twice because a module is only actually loaded once. This prevents possible unbound recursion. So your print statement gets executed once by the imported module and once by the main program.
Since you're importing main inside main the print statement is executed twice,thats how python works
Related
I know that when we do 'import module_name', then it gets loaded only once, irrespective of the number of times the code passes through the import statement.
But if we move the import statement into a function, then for each function call does the module get re-loaded? If not, then why is it a good practice to import a module at the top of the file, instead of in function?
Does this behavior change for a multi threaded or multi process app?
It does not get loaded every time.
Proof:
file.py:
print('hello')
file2.py:
def a():
import file
a()
a()
Output:
hello
Then why put it on the top?:
Because writing the imports inside a function will cause calls to that function take longer.
I know that when we do 'import module_name', then it gets loaded only once, irrespective of the number of times the code passes through the import statement.
Right!
But if we move the import statement into a function, then for each function call does the module get re-loaded?
No. But if you want, you can explicitly do it something like this:
import importlib
importlib.reload(target_module)
If not, then why is it a good practice to import a module at the top of the file, instead of in function?
When Python imports a module, it first checks the module registry (sys.modules) to see if the module is already imported. If that’s the case, Python uses the existing module object as is.
Even though it does not get reloaded, still it has to check if this module is already imported or not. So, there is some extra work done each time the function is called which is unnecessary.
It doesn't get reloaded after every function call and threading does not change this behavior. Here's how I tested it:
test.py:
print("Loaded")
testing.py:
import _thread
def call():
import test
for i in range(10):
call()
_thread.start_new_thread(call, ())
_thread.start_new_thread(call, ())
OUTPUT:
LOADED
To answer your second question, if you import the module at the top of the file, the module will be imported for all functions within the python file. This saves you from having to import the same module in multiple functions if they use the same module.
I am new to python and trying to create a module that would fetch a specific variable from an active module preferably as read only. I have tried importing the file in test2 but the print statement displays the length as 0. Cant understand why it is not able to get the current status of the variable and only reading the initialization.
Below is what I have tried, any help would be greatly appreciated.
Thanks.
test1.py
from datetime import datetime,timedelta
import time
data=[]
stop=datetime.now()+timedelta(minutes=5)
while datetime.now()<stop:
time.sleep(1)
data.append(datetime.now().time())
test2.py:
from test1 import *
print len(data)
When you wrap statements in:
if __name__ == "__main__"
The code within only gets executed when you execute that specific module from the Python interpreter. So your Main function won't get executed, and consequently your data variable won't be initialized, unless you run:
python test2.py
You can actually just remove that clause and it will work as expected.
To preface, I think I may have figured out how to get this code working (based on Changing module variables after import), but my question is really about why the following behavior occurs so I can understand what to not do in the future.
I have three files. The first is mod1.py:
# mod1.py
import mod2
var1A = None
def func1A():
global var1
var1 = 'A'
mod2.func2()
def func1B():
global var1
print var1
if __name__ == '__main__':
func1A()
Next I have mod2.py:
# mod2.py
import mod1
def func2():
mod1.func1B()
Finally I have driver.py:
# driver.py
import mod1
if __name__ == '__main__':
mod1.func1A()
If I execute the command python mod1.py then the output is None. Based on the link I referenced above, it seems that there is some distinction between mod1.py being imported as __main__ and mod1.py being imported from mod2.py. Therefore, I created driver.py. If I execute the command python driver.py then I get the expected output: A. I sort of see the difference, but I don't really see the mechanism or the reason for it. How and why does this happen? It seems counterintuitive that the same module would exist twice. If I execute python mod1.py, would it be possible to access the variables in the __main__ version of mod1.py instead of the variables in the version imported by mod2.py?
The __name__ variable always contains the name of the module, except when the file has been loaded into the interpreter as a script instead. Then that variable is set to the string '__main__' instead.
After all, the script is then run as the main file of the whole program, everything else are modules imported directly or indirectly by that main file. By testing the __name__ variable, you can thus detect if a file has been imported as a module, or was run directly.
Internally, modules are given a namespace dictionary, which is stored as part of the metadata for each module, in sys.modules. The main file, the executed script, is stored in that same structure as '__main__'.
But when you import a file as a module, python first looks in sys.modules to see if that module has already been imported before. So, import mod1 means that we first look in sys.modules for the mod1 module. It'll create a new module structure with a namespace if mod1 isn't there yet.
So, if you both run mod1.py as the main file, and later import it as a python module, it'll get two namespace entries in sys.modules. One as '__main__', then later as 'mod1'. These two namespaces are completely separate. Your global var1 is stored in sys.modules['__main__'], but func1B is looking in sys.modules['mod1'] for var1, where it is None.
But when you use python driver.py, driver.py becomes the '__main__' main file of the program, and mod1 will be imported just once into the sys.modules['mod1'] structure. This time round, func1A stores var1 in the sys.modules['mod1'] structure, and that's what func1B will find.
Regarding a practical solution for using a module optionally as main script - supporting consistent cross-imports:
Solution 1:
See e.g. in Python's pdb module, how it is run as a script by importing itself when executing as __main__ (at the end) :
#! /usr/bin/env python
"""A Python debugger."""
# (See pdb.doc for documentation.)
import sys
import linecache
...
# When invoked as main program, invoke the debugger on a script
if __name__ == '__main__':
import pdb
pdb.main()
Just I would recommend to reorganize the __main__ startup to the beginning of the script like this:
#! /usr/bin/env python
"""A Python debugger."""
# When invoked as main program, invoke the debugger on a script
import sys
if __name__ == '__main__':
##assert os.path.splitext(os.path.basename(__file__))[0] == 'pdb'
import pdb
pdb.main()
sys.exit(0)
import linecache
...
This way the module body is not executed twice - which is "costly", undesirable and sometimes critical.
Solution 2:
In rarer cases it is desirable to expose the actual script module __main__ even directly as the actual module alias (mod1):
# mod1.py
import mod2
...
if __name__ == '__main__':
# use main script directly as cross-importable module
_mod = sys.modules['mod1'] = sys.modules[__name__]
##_modname = os.path.splitext(os.path.basename(os.path.realpath(__file__)))[0]
##_mod = sys.modules[_modname] = sys.modules[__name__]
func1A()
Known drawbacks:
reload(_mod) fails
pickle'ed classes would need extra mappings for unpickling (find_global ..)
I have a utility module in Python that needs to know the name of the application that it is being used in. Effectively this means the name of the top-level python script that was invoked to start the application (i.e. the one where __name=="__main__" would be true). __name__ gives me the name of the current python file, but how do I get the name of the top-most one in the call chain?
Having switch my Google query to "how to to find the process name from python" vs how to find the "top level script name", I found this overly thorough treatment of the topic. The summary of which is the following:
import __main__
import os
appName = os.path.basename(__main__.__file__).strip(".py")
You could use the inspect module for this. For example:
a.py:
#!/usr/bin/python
import b
b.py:
#!/usr/bin/python
import inspect
print inspect.stack()[-1][1]
Running python b.py prints b.py. Running python a.py prints a.py.
However, I'd like to second the suggestion of sys.argv[0] as a more sensible and idiomatic suggestion.
Can someone explain why executing the following code:
file "hello.py":
import hello
print "hello"
hello = reload(hello)
executing as python hello.py prints the following?
hello
hello
hello
hello
Why 4 times? I know that when a module is already imported it's not imported again, but reload forces to reload a module even if it's already loaded. I'd have expected as a result unlimit 'hello' prints.
What has to happen so reload won't reload a module?
python hello.py (A) runs the code once, when (A) calls import hello the code is run again (B), when (A) and (B) call reload(hello), the code is run twice more, for four times total.
In general, for the lifetime of a program a module's code will be executed at the following times:
Once if it is the main module
When it is imported the first time by any module (including itself)
Any time reload() is called on the module
As for why the reload() is not called recursively, there is an early exit point to the PyImport_ReloadModule() function (CPython, file is import.c) to prevent this:
http://svn.python.org/view/python/trunk/Python/import.c?view=markup#l2646
...
existing_m = PyDict_GetItemString(modules_reloading, name);
if (existing_m != NULL) {
/* Due to a recursive reload, this module is already
being reloaded. */
Py_INCREF(existing_m);
return existing_m;
}
... load module code is below here
reload keeps a list (actually a dict) of modules it is currently reloading to avoid reloading modules recursively.
See http://hg.python.org/cpython/file/e6b8202443b6/Lib/imp.py#l236
This isn't documented, as such, but I think you can probably rely on it remaining the case.