Been a longtime browser of SO, finally asking my own questions!
So, I am writing an automation script/module that looks through a directory recursively for python modules with a specific name. If I find a module with that name, I load it dynamically, pull what I need from it, and then unload it. I noticed though that simply del'ing the module does not remove all references to that module, there is another lingering somewhere and I do not know where it is. I tried taking a peek at the source code, but couldn't make sense of it too well. Here is a sample of what I am seeing, greatly simplified:
I am using Python 3.5.2 (Anaconda v4.2.0). I am using importlib, and that is what I want to stick with. I also want to be able to do this with vanilla python-3.
I got the import from source from the python docs here (yes I am aware this is the Python 3.6 docs).
My main driver...
# main.py
import importlib.util
import sys
def foo():
spec = importlib.util.spec_from_file_location('a', 'a.py')
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
print(sys.getrefcount(module))
del module
del spec
if __name__ == '__main__':
foo()
print('THE END')
And my sample module...
# a.py
print('hello from a')
class A():
def __del__(self):
print('SO LONG A!')
inst = A()
Output:
python main.py
HELLO FROM A!
2
THE END
SO LONG A!
I expected to see "SO LONG A!" printed before "THE END". So, where is this other hidden reference to my module? I understand that my del's are gratuitous with the fact that I have it wrapped in a function. I just wanted the deletion and scope to be explicit. How do I get a.py to completely unload? I plan on dynamically loading a ton of modules like a.py, and I do not want to hold on to them any longer than I really have to. Is there something I am missing?
There is a circular reference here, the module object references objects that reference the module again.
This means the module is not cleared immediately (as the reference count never goes to 0 by itself). You need to wait for the circle to be broken by the garbage collector.
You can force this by calling gc.collect():
import gc
# ...
if __name__ == '__main__':
foo()
gc.collect()
print('THE END')
With that in place, the output becomes:
$ python main.py
hello from a
2
SO LONG A!
THE END
Related
I have two python files. One is main.py which I execute. The other is test.py, from which I import a class in main.py.
Here is the sample code for main.py:
from test import Test
if __name__ == '__main__':
print("You're inside main.py")
test_object = Test()
And, here is the sample code for test.py:
class Test:
def __init__(self):
print("you're initializing the class.")
if __name__ == '__main__':
print('You executed test.py')
else:
print('You executed main.py')
Finally, here's the output, when you execute main.py:
You executed main.py
You're inside main.py
you're initializing the class.
From the order of outputs above, you can see that once you import a piece of a file, the whole file gets executed immediately. I am wondering why? what's the logic behind that?
I am coming from java language, where all files included a single class with the same name. I am just confused that why python behaves this way.
Any explanation would be appricated.
What is happening?
When you import the test-module, the interpreter runs through it, executing line by line. Since the if __name__ == '__main__' evaluates as false, it executes the else-clause. After this it continues beyond the from test import Test in main.py.
Why does python execute the imported file?
Python is an interpreted language. Being interpreted means that the program being read and evaluated one line at the time. Going through the imported module, the interpreter needs to evaluate each line, as it has no way to discern which lines are useful to the module or not. For instance, a module could have variables that need to be initialized.
Python is designed to support multiple paradigms. This behavior is used in some of the paradigms python supports, such as procedural programming.
Execution allows the designer of that module to account for different use cases. The module could be imported or run as a script. To accommodate this, some functions, classes or methods may need to be redefined. As an example, a script could output non-critical errors to the terminal, while an imported module to a log-file.
Why specify what to import?
Lets say you are importing two modules, both with a Test-class. If everything from those modules is imported, only one version of the Test-class can exist in our program. We can resolve this issue using different syntax.
import package1
import package2
package1.Test()
packade2.Test()
Alternatively, you can rename them with the as-keyword.
from package1 import Test
from package2 import Test as OtherTest
Test()
OtherTest()
Dumping everything into the global namepace (i.e from test import *) pollutes the namespace of your program with a lot of definitions you might not need and unintentionally overwrite/use.
where all files included a single class with the same name
There is not such requirement imposed in python, you can put multiple classes, functions, values in single .py file for example
class OneClass:
pass
class AnotherClass:
pass
def add(x,y):
return x+y
def diff(x,y):
return x-y
pi = 22/7
is legal python file.
According to interview with python's creator modules mechanism in python was influenced by Modula-2 and Modula-3 languages. So maybe right question is why creators of said languages elected to implement modules that way?
I've found plenty about calling variables from another module, but I can't find anything about editing a variable in a separate module.
For my first actual project with python I'm writing a little text adventure.
I've got GV.py for global variables, and Classes.py for weapons/armor etc.
I want Classes.Longsword() to modify a variable in GV.py, specifically variable GV.weapon.
In GV.py
import Classes
global weapon
weapon = 'Unarmed'
in Classes.py
import GV
def Longsword():
GV.weapon = 'Longsword'
This does not edit the variable weapon in GV.py.. Am I missing something?
Since it was asked I'll put the output here.
repl from GV.py
weapon
'Unarmed'
Classes.Longsword()
>>>>
weapon
'Unarmed'
Your problem is that, when you run the script GV.py, what you get isn't a module named GV, but a special module named __main__.
If you've seen the familiar main guard idiom, that can be used to write a single file that works as a module or as a top-level script, this is exactly how it works:
if __name__ == '__main__':
That will be true when you're running the file as a script, and it will not be true when you're importing the file as a module.
Now, when you do import GV from inside Classes, that imports the exact same GV.py file, and builds a second module object out of it (and this one, of course, is named GV).
So when you then do GV.weapon = 'longsword', you're changing the weapon global in the GV module, not the __main__ module.
If you're wondering how this works under the covers: the way Python makes sure there's only one Classes module object no matter how many times you import Classes is dead simple: it just caches loaded modules in a dict, sys.modules, and looks them up by name.
If you fix that, you've got a second problem here: the first thing GV.py does is to import Classes. The first thing Classes.py does is to import GV. This is a circular import.
The usual way to deal with both of these problems is pretty simple: put the shared globals in a simple module that does little or nothing beyond holding those shared globals, so everyone can import it:
# main.py
import classes
import GV
GV.weapon = 'Unarmed'
# Classes.py
import GV
def Longsword():
GV.weapon = 'Longsword'
# GV.py
# empty... or you can move weapon = 'Unarmed' here
I am a python beginne, and am currently learning import modules in python.
So my question is:
Suppose I currently have three python files, which is module1.py, module2.py, and module3.py;
In module1.py:
def function1():
print('Hello')
In module2.py, in order to use those functions in module1.py:
import module1
#Also, I have some other public functions in this .py file
def function2():
print('Goodbye')
#Use the function in module1.py
if __name__ == '__main__':
module1.function1();
function2();
In module3.py, I would like to use both the functions from module1.py and module2.py.
import module1
import module2
def function3():
print('Nice yo meet you');
if __name__ == '__main__':
module1.function1()
function3()
module2.function2()
Seems like it works. But my questions are mainly on module3.py. The reason is that in module3.py, I imported both module1 and module2. However, module1 is imported by module2 already. I am just wondering if this is a good way to code? Is this effective? Should I do this? or Should I just avoid doing this and why?
Thank you so much. I am just a beginner, so if I ask stupid questions, please forgive me. Thank you!!
There will be no problem if you avoid circular imports, that is you never import a module that itself imports the current importing module.
A module does not see the importer namespace, so imports in the importer code don't become globals to the imported module.
Also module top-level code will run on first import only.
Edit 1:
I am answering Filipe's comments here because its easier.
"There will be no problem if you avoid circular imports" -> This is incorrect, python is fine with circular imports for the most part."
The fact that you sensed some misconception of mine, doesn't make that particular statement incorrect. It is correct and it is good advice.
(Saying it's fine for the most part looks a bit like saying something will run fine most of time...)
I see what you mean. I avoid it so much that I even thought your first example would give an error right away (it doesn't). You mean there is no need to avoid it because most of the time (actually given certain conditions) Python will go fine with it. I am also certain that there are cases where circular imports would be the easiest solution. That doesn't mean we should use them if we have a choice. That would promote the use of a bad architecture, where every module starts depending on every other.
It also means the coder has to be aware of the caveats.
This link I found here in SO states some of the worries about circular imports.
The previous link is somewhat old so info can be outdated by newer Python versions, but import confusion is even older and still apllies to 3.6.2.
The example you give works well because relevant or initialization module code is wrapped in a function and will not run at import time. Protecting code with an if __name__ == "__main__": also removes it from running when imported.
Something simple like this (the same example from effbot.org) won't work (remember OP says he is a beginner):
# file y.py
import x
x.func1()
# file x.py
import y
def func1():
print('printing from x.func1')
On your second comment you say:
"This is also incorrect. An imported module will become part of the namespace"
Yes. But I didn't mention that, nor its contrary. I just said that an imported module code doesn't know the namespace of the code making the import.
To eliminate the ambiguity I just meant this:
# w.py
def funcw():
print(z_var)
# z.py
import w
z_var = 'foo'
w.funcw() # error: z_var undefined in w module namespace
Running z.py gives the stated error. That's all that I meant.
Now going further, to get the access we want, we go circular...
# w.py
import z # go circular
def funcw():
'''Notice that we gain access not to the z module that imported
us but to the z module we import (yes its the same thing but
carries a different namespace). So the reference we obtain
points to a different object, because it really is in a
different namespace.'''
print(z.z_var, id(z.z_var))
...and we protect some code from running with the import:
# z.py
import w
z_var = ['foo']
if __name__ == '__main__':
print(z_var, id(z_var))
w.funcw()
By running z.py we confirm the objects are different (they can be the same with immutables, but that is python kerning - internal optimization, or implementation details - at work):
['foo'] 139791984046856
['foo'] 139791984046536
Finally I agree with your third comment about being explicit with imports.
Anyway I thank your comments. I actually improved my understanding of the problem because of them (we don't learn much about something by just avoiding it).
If I execfile a module, and remove all (of my) reference to that module, it's functions continue to work as expected. That's normal.
However, if that execfile'd module imports other modules, and I remove all references to those modules, the functions defined in those modules start to see all their global values as None. This causes things to fail spectacularly, of course, and in a very supprising manner (TypeError NoneType on string constants, for example).
I'm surprised that the interpreter makes a special case here; execfile doesn't seem special enough to cause functions to behave differently wrt module references.
My question: Is there any clean way to make the execfile-function behavior recursive (or global for a limited context) with respect to modules imported by an execfile'd module?
To the curious:
The application is reliable configuration reloading under buildbot. The buildbot configuration is executable python, for better or for worse. If the executable configuration is a single file, things work fairly well. If that configuration is split into modules, any imports from the top-level file get stuck to the original version, due to the semantics of __import__ and sys.modules. My strategy is to hold the contents of sys.modules constant before and after configuration, so that each reconfig looks like an initial configuration. This almost works except for the above function-global reference issue.
Here's a repeatable demo of the issue:
import gc
import sys
from textwrap import dedent
class DisableModuleCache(object):
"""Defines a context in which the contents of sys.modules is held constant.
i.e. Any new entries in the module cache (sys.modules) are cleared when exiting this context.
"""
modules_before = None
def __enter__(self):
self.modules_before = sys.modules.keys()
def __exit__(self, *args):
for module in sys.modules.keys():
if module not in self.modules_before:
del sys.modules[module]
gc.collect() # force collection after removing refs, for demo purposes.
def reload_config(filename):
"""Reload configuration from a file"""
with DisableModuleCache():
namespace = {}
exec open(filename) in namespace
config = namespace['config']
del namespace
config()
def main():
open('config_module.py', 'w').write(dedent('''
GLOBAL = 'GLOBAL'
def config():
print 'config! (old implementation)'
print GLOBAL
'''))
# if I exec that file itself, its functions maintain a reference to its modules,
# keeping GLOBAL's refcount above zero
reload_config('config_module.py')
## output:
#config! (old implementation)
#GLOBAL
# If that file is once-removed from the exec, the functions no longer maintain a reference to their module.
# The GLOBAL's refcount goes to zero, and we get a None value (feels like weakref behavior?).
open('main.py', 'w').write(dedent('''
from config_module import *
'''))
reload_config('main.py')
## output:
#config! (old implementation)
#None
## *desired* output:
#config! (old implementation)
#GLOBAL
acceptance_test()
def acceptance_test():
# Have to wait at least one second between edits (on ext3),
# or else we import the old version from the .pyc file.
from time import sleep
sleep(1)
open('config_module.py', 'w').write(dedent('''
GLOBAL2 = 'GLOBAL2'
def config():
print 'config2! (new implementation)'
print GLOBAL2
## There should be no such thing as GLOBAL. Naive reload() gets this wrong.
try:
print GLOBAL
except NameError:
print 'got the expected NameError :)'
else:
raise AssertionError('expected a NameError!')
'''))
reload_config('main.py')
## output:
#config2! (new implementation)
#None
#got the expected NameError :)
## *desired* output:
#config2! (new implementation)
#GLOBAL2
#got the expected NameError :)
if __name__ == '__main__':
main()
I don't think you need the 'acceptance_test' part of things here. The issue isn't actually weakrefs, it's modules' behavior on destruction. They clear out their __dict__ on delete. I vaguely remember that this is done to break ref cycles. I suspect that global references in function closures do something fancy to avoid a hash lookup on every invocation, which is why you get None and not a NameError.
Here's a much shorter sscce:
import gc
import sys
import contextlib
from textwrap import dedent
#contextlib.contextmanager
def held_modules():
modules_before = sys.modules.keys()
yield
for module in sys.modules.keys():
if module not in modules_before:
del sys.modules[module]
gc.collect() # force collection after removing refs, for demo purposes.
def main():
open('config_module.py', 'w').write(dedent('''
GLOBAL = 'GLOBAL'
def config():
print 'config! (old implementation)'
print GLOBAL
'''))
open('main.py', 'w').write(dedent('''
from config_module import *
'''))
with held_modules():
namespace = {}
exec open('main.py') in namespace
config = namespace['config']
config()
if __name__ == '__main__':
main()
Or, to put it another way, don't delete modules and expect their contents to continue functioning.
You should consider importing the configuration instead of execing it.
I use import for a similar purpose, and it works great. (specifically, importlib.import_module(mod)). Though, my configs consists mainly of primitives, not real functions.
Like you, I also have a "guard" context to restore the original contents of sys.modules after the import. Plus, I use sys.dont_write_bytecode = True (of course, you can add that to your DisableModuleCache -- set to True in __enter__ and to False in __exit__). This would ensure the config actually "runs" each time you import it.
The main difference between the two approaches, (other than the fact you don't have to rely on the state the interpreter stays in after execing (which I consider semi-unclean)), is that the config files are identified by their module-name/path (as used for importing) rather than the file name.
EDIT: A link to the implementation of this approach, as part of the Figura package.
I just made a fresh copy of eclipse and installed pydev.
In my first trial to use pydev with eclipse, I created 2 module under the src package(the default one)
FirstModule.py:
'''
Created on 18.06.2009
#author: Lars Vogel
'''
def add(a,b):
return a+b
def addFixedValue(a):
y = 5
return y +a
print "123"
run.py:
'''
Created on Jun 20, 2011
#author: Raymond.Yeung
'''
from FirstModule import add
print add(1,2)
print "Helloword"
When I pull out the pull down menu of the run button, and click "ProjectName run.py", here is the result:
123
3
Helloword
Apparantly both module ran, why? Is this the default setting?
When you import a module, everything in it is "run". This means that classes and function objects are created, global variables are set, and print statements are executed. *)
It is common practice to enclose statements only meant to be executed when the module is run directly in an if-block such as this:
if __name__ == "__main__":
print "123"
Now if you run the module as a script, __name__ is set to "__main__", so "123" will be printed. However, if you import the module from somewhere else __name__ will be "FirstModule" in your case, not "__main__", so whatever is in the block will not be executed.
*) Note that if you import the same module again, it is not "run" again. Python keeps track of imported modules and just re-uses the already imported module the second time. This makes C/C++ tricks like enclosing header file bodies with IFNDEF statements to make sure the header is only imported once unnecessary in python.