Recursive module import and reload

Recursive module import and reload - python

Can someone explain why executing the following code:
file "hello.py":
import hello
print "hello"
hello = reload(hello)
executing as python hello.py prints the following?
hello
hello
hello
hello
Why 4 times? I know that when a module is already imported it's not imported again, but reload forces to reload a module even if it's already loaded. I'd have expected as a result unlimit 'hello' prints.
What has to happen so reload won't reload a module?

python hello.py (A) runs the code once, when (A) calls import hello the code is run again (B), when (A) and (B) call reload(hello), the code is run twice more, for four times total.
In general, for the lifetime of a program a module's code will be executed at the following times:
Once if it is the main module
When it is imported the first time by any module (including itself)
Any time reload() is called on the module
As for why the reload() is not called recursively, there is an early exit point to the PyImport_ReloadModule() function (CPython, file is import.c) to prevent this:
http://svn.python.org/view/python/trunk/Python/import.c?view=markup#l2646
...
existing_m = PyDict_GetItemString(modules_reloading, name);
if (existing_m != NULL) {
/* Due to a recursive reload, this module is already
being reloaded. */
Py_INCREF(existing_m);
return existing_m;
}
... load module code is below here

reload keeps a list (actually a dict) of modules it is currently reloading to avoid reloading modules recursively.
See http://hg.python.org/cpython/file/e6b8202443b6/Lib/imp.py#l236
This isn't documented, as such, but I think you can probably rely on it remaining the case.

Related

Why do you run other lines of codes of a python file(say test.py) when you are just importing a piece of test.py from somewhere else (say main.py)?

I have two python files. One is main.py which I execute. The other is test.py, from which I import a class in main.py.
Here is the sample code for main.py:
from test import Test
if __name__ == '__main__':
print("You're inside main.py")
test_object = Test()
And, here is the sample code for test.py:
class Test:
def __init__(self):
print("you're initializing the class.")
if __name__ == '__main__':
print('You executed test.py')
else:
print('You executed main.py')
Finally, here's the output, when you execute main.py:
You executed main.py
You're inside main.py
you're initializing the class.
From the order of outputs above, you can see that once you import a piece of a file, the whole file gets executed immediately. I am wondering why? what's the logic behind that?
I am coming from java language, where all files included a single class with the same name. I am just confused that why python behaves this way.
Any explanation would be appricated.

What is happening?
When you import the test-module, the interpreter runs through it, executing line by line. Since the if __name__ == '__main__' evaluates as false, it executes the else-clause. After this it continues beyond the from test import Test in main.py.
Why does python execute the imported file?
Python is an interpreted language. Being interpreted means that the program being read and evaluated one line at the time. Going through the imported module, the interpreter needs to evaluate each line, as it has no way to discern which lines are useful to the module or not. For instance, a module could have variables that need to be initialized.
Python is designed to support multiple paradigms. This behavior is used in some of the paradigms python supports, such as procedural programming.
Execution allows the designer of that module to account for different use cases. The module could be imported or run as a script. To accommodate this, some functions, classes or methods may need to be redefined. As an example, a script could output non-critical errors to the terminal, while an imported module to a log-file.
Why specify what to import?
Lets say you are importing two modules, both with a Test-class. If everything from those modules is imported, only one version of the Test-class can exist in our program. We can resolve this issue using different syntax.
import package1
import package2
package1.Test()
packade2.Test()
Alternatively, you can rename them with the as-keyword.
from package1 import Test
from package2 import Test as OtherTest
Test()
OtherTest()
Dumping everything into the global namepace (i.e from test import *) pollutes the namespace of your program with a lot of definitions you might not need and unintentionally overwrite/use.

where all files included a single class with the same name
There is not such requirement imposed in python, you can put multiple classes, functions, values in single .py file for example
class OneClass:
pass
class AnotherClass:
pass
def add(x,y):
return x+y
def diff(x,y):
return x-y
pi = 22/7
is legal python file.
According to interview with python's creator modules mechanism in python was influenced by Modula-2 and Modula-3 languages. So maybe right question is why creators of said languages elected to implement modules that way?

Why is Hello printed only two times?

main.py
#main.py
import main
print('Hello')
Output:
Hello
Hello
I believe that when it comes to the line import main, at that time, main is registered in sys.modules and hence the import statement of another script - which I believe, is not a part of __main__ - is not executed. Can someone please tell me whether I understand it correctly? If not, please give an explanation.

Let's add a little debugging output:
import sys
print([key for key in sys.modules.keys() if 'main' in key])
import main
It prints:
['__main__']
['__main__', 'main']
Why is that?
If you run a module it will not be added as its modules name to sys.modules. Instead it will always be __main__.
If you then import the module by its name (main). That name is not present in sys.modules and as the result the module will be imported again, its code executed and the modules stored in sys.modules under its name.
On executing main.py it will print ['__main__'] and on the re-import it will print both module names: ['__main__', 'main'].
This implies one rule: try not to import the module you are running anywhere in your code.

It only prints it twice because a module is only actually loaded once. This prevents possible unbound recursion. So your print statement gets executed once by the imported module and once by the main program.

Since you're importing main inside main the print statement is executed twice,thats how python works

Does using 'import module_name' statement in a function cause the module to be reloaded?

I know that when we do 'import module_name', then it gets loaded only once, irrespective of the number of times the code passes through the import statement.
But if we move the import statement into a function, then for each function call does the module get re-loaded? If not, then why is it a good practice to import a module at the top of the file, instead of in function?
Does this behavior change for a multi threaded or multi process app?

It does not get loaded every time.
Proof:
file.py:
print('hello')
file2.py:
def a():
import file
a()
a()
Output:
hello
Then why put it on the top?:
Because writing the imports inside a function will cause calls to that function take longer.

I know that when we do 'import module_name', then it gets loaded only once, irrespective of the number of times the code passes through the import statement.
Right!
But if we move the import statement into a function, then for each function call does the module get re-loaded?
No. But if you want, you can explicitly do it something like this:
import importlib
importlib.reload(target_module)
If not, then why is it a good practice to import a module at the top of the file, instead of in function?
When Python imports a module, it first checks the module registry (sys.modules) to see if the module is already imported. If that’s the case, Python uses the existing module object as is.
Even though it does not get reloaded, still it has to check if this module is already imported or not. So, there is some extra work done each time the function is called which is unnecessary.

It doesn't get reloaded after every function call and threading does not change this behavior. Here's how I tested it:
test.py:
print("Loaded")
testing.py:
import _thread
def call():
import test
for i in range(10):
call()
_thread.start_new_thread(call, ())
_thread.start_new_thread(call, ())
OUTPUT:
LOADED
To answer your second question, if you import the module at the top of the file, the module will be imported for all functions within the python file. This saves you from having to import the same module in multiple functions if they use the same module.

Python C extension importer how to use import lock

Background
I'm building an importer to encrypt my code. When compiling, I encrypt the code object in pyc files. When loading, I use this customized importer to decrypt the code object before execution.
Since my code is bundled in zip format, I decided to modify zipimporter from the builtin zipimport package by adding the decrypt logic into it.
I'm using Python 3.7
Problem
I modified the zimpimport.c and make it into C extension. The encrypt & decrypt process works fine but I started to see ImportError & AttributeError. For example:
"""module: test.py"""
class Foo:
def bar():
pass
foo = Foo()
// Import Error: can not import name 'foo' from test
// Attribute Error: foo has no attribute bar
When server starts, this error only occurs during the first few minutes randomly. So I suspect that it's a multithreading problem and it's caused by a thread seeing a partially initialized module object.
What I tried
After checking the origin c code, the implementation for loading a module is:
mod = PyImport_AddModuleObject(fullname);
// ... some code
mod = PyImport_ExecCodeModuleObject(fullname, code, modpath, NULL);
It first add module to sys.modules, then execute some c code and execute the module code at last. If another thread read the sys.modules before it finishes executing module code, it can get the ImportError and AttributeError.
I just copy the original implementation and it doesn't use import lock. I guess it's handled by interpreter on outer level and I have to explicitly use import lock. So I wrap the code block with the import lock as following:
#include "import.h"
_PyImport_AcquireLock();
// wrapped code block
_PyImport_ReleaseLock();
My Question
After adding import lock, the error still happens. I'm not familiar with python c api. Have I used the lock in a correct way? Are there other possible reasons that can cause this error?
Code
To test, put the following files in same folder:
zzimporter.c lock added in func zipimport_zipimporter_load_module_impl
zzimporter.h
setup.py
Run python setup.py build_ext --inplace and it will generate zzimporter.so. And it works just like the buit-in zipimport module.

Why do all module run together?

I just made a fresh copy of eclipse and installed pydev.
In my first trial to use pydev with eclipse, I created 2 module under the src package(the default one)
FirstModule.py:
'''
Created on 18.06.2009
#author: Lars Vogel
'''
def add(a,b):
return a+b
def addFixedValue(a):
y = 5
return y +a
print "123"
run.py:
'''
Created on Jun 20, 2011
#author: Raymond.Yeung
'''
from FirstModule import add
print add(1,2)
print "Helloword"
When I pull out the pull down menu of the run button, and click "ProjectName run.py", here is the result:
123
3
Helloword
Apparantly both module ran, why? Is this the default setting?

When you import a module, everything in it is "run". This means that classes and function objects are created, global variables are set, and print statements are executed. *)
It is common practice to enclose statements only meant to be executed when the module is run directly in an if-block such as this:
if __name__ == "__main__":
print "123"
Now if you run the module as a script, __name__ is set to "__main__", so "123" will be printed. However, if you import the module from somewhere else __name__ will be "FirstModule" in your case, not "__main__", so whatever is in the block will not be executed.
*) Note that if you import the same module again, it is not "run" again. Python keeps track of imported modules and just re-uses the already imported module the second time. This makes C/C++ tricks like enclosing header file bodies with IFNDEF statements to make sure the header is only imported once unnecessary in python.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Recursive module import and reload - python

reload keeps a list (actually a dict) of modules it is currently reloading to avoid reloading modules recursively. See http://hg.python.org/cpython/file/e6b8202443b6/Lib/imp.py#l236 This isn't documented, as such, but I think you can probably rely on it remaining the case.

Related

Why do you run other lines of codes of a python file(say test.py) when you are just importing a piece of test.py from somewhere else (say main.py)?

Why is Hello printed only two times?

Does using 'import module_name' statement in a function cause the module to be reloaded?

Python C extension importer how to use import lock

Why do all module run together?

Categories

Resources