Unloading an Unused Module in Python (A Very Specific Case)

Unloading an Unused Module in Python (A Very Specific Case) - python

First of all, I must tell you that I have already looked this this bug
and I understand that the feature is (in general) not possible for a long time. However, I have a use case which is very specific. Hence, I will try to describe my use case, and ask for suggestions.
I am writing an interactive python application which is run from the interpreter. So, the user might make a mistake or two when importing modules, and I would like it, if I could provide a method for the user to delete the module (as soon as he has imported it).
So, one of the problems with the references to the module being already incorporated into other objects is gone. Once, I am sure that the module has not been used at all, what can I do to remove it? Is it still technically possible?
I was thinking that if I could create a function which manually deletes every object/function created by the module when imported, then what I want might be accomplished. Is this true?
IPython does a similar operation with its extensions. Is that the correct way to go?

Modules are just a namespace, an object, stored in the sys.modules mapping.
If there are no other references to anything belonging in that module, you can remove it from the sys.modules mapping to delete it again:
>>> import timeit
>>> import sys
>>> del timeit # remove local reference to the timeit module
>>> del sys.modules['timeit'] # remove module

Related

Importing variable from another file changes it [duplicate]

While developing a largeish project (split in several files and folders) in Python with IPython, I run into the trouble of cached imported modules.
The problem is that instructions import module only reads the module once, even if that module has changed! So each time I change something in my package, I have to quit and restart IPython. Painful.
Is there any way to properly force reloading some modules? Or, better, to somehow prevent Python from caching them?
I tried several approaches, but none works. In particular I run into really, really weird bugs, like some modules or variables mysteriously becoming equal to None...
The only sensible resource I found is Reloading Python modules, from pyunit, but I have not checked it. I would like something like that.
A good alternative would be for IPython to restart, or restart the Python interpreter somehow.
So, if you develop in Python, what solution have you found to this problem?
Edit
To make things clear: obviously, I understand that some old variables depending on the previous state of the module may stick around. That's fine by me. By why is that so difficult in Python to force reload a module without having all sort of strange errors happening?
More specifically, if I have my whole module in one file module.py then the following works fine:
import sys
try:
del sys.modules['module']
except AttributeError:
pass
import module
obj = module.my_class()
This piece of code works beautifully and I can develop without quitting IPython for months.
However, whenever my module is made of several submodules, hell breaks loose:
import os
for mod in ['module.submod1', 'module.submod2']:
try:
del sys.module[mod]
except AttributeError:
pass
# sometimes this works, sometimes not. WHY?
Why is that so different for Python whether I have my module in one big file or in several submodules? Why would that approach not work??

import checks to see if the module is in sys.modules, and if it is, it returns it. If you want import to load the module fresh from disk, you can delete the appropriate key in sys.modules first.
There is the reload builtin function which will, given a module object, reload it from disk and that will get placed in sys.modules. Edit -- actually, it will recompile the code from the file on the disk, and then re-evalute it in the existing module's __dict__. Something potentially very different than making a new module object.
Mike Graham is right though; getting reloading right if you have even a few live objects that reference the contents of the module you don't want anymore is hard. Existing objects will still reference the classes they were instantiated from is an obvious issue, but also all references created by means of from module import symbol will still point to whatever object from the old version of the module. Many subtly wrong things are possible.
Edit: I agree with the consensus that restarting the interpreter is by far the most reliable thing. But for debugging purposes, I guess you could try something like the following. I'm certain that there are corner cases for which this wouldn't work, but if you aren't doing anything too crazy (otherwise) with module loading in your package, it might be useful.
def reload_package(root_module):
package_name = root_module.__name__
# get a reference to each loaded module
loaded_package_modules = dict([
(key, value) for key, value in sys.modules.items()
if key.startswith(package_name) and isinstance(value, types.ModuleType)])
# delete references to these loaded modules from sys.modules
for key in loaded_package_modules:
del sys.modules[key]
# load each of the modules again;
# make old modules share state with new modules
for key in loaded_package_modules:
print 'loading %s' % key
newmodule = __import__(key)
oldmodule = loaded_package_modules[key]
oldmodule.__dict__.clear()
oldmodule.__dict__.update(newmodule.__dict__)
Which I very briefly tested like so:
import email, email.mime, email.mime.application
reload_package(email)
printing:
reloading email.iterators
reloading email.mime
reloading email.quoprimime
reloading email.encoders
reloading email.errors
reloading email
reloading email.charset
reloading email.mime.application
reloading email._parseaddr
reloading email.utils
reloading email.mime.base
reloading email.message
reloading email.mime.nonmultipart
reloading email.base64mime

Quitting and restarting the interpreter is the best solution. Any sort of live reloading or no-caching strategy will not work seamlessly because objects from no-longer-existing modules can exist and because modules sometimes store state and because even if your use case really does allow hot reloading it's too complicated to think about to be worth it.

With IPython comes the autoreload extension that automatically repeats an import before each function call. It works at least in simple cases, but don't rely too much on it: in my experience, an interpreter restart is still required from time to time, especially when code changes occur only on indirectly imported code.
Usage example from the linked page:
In [1]: %load_ext autoreload
In [2]: %autoreload 2
In [3]: from foo import some_function
In [4]: some_function()
Out[4]: 42
In [5]: # open foo.py in an editor and change some_function to return 43
In [6]: some_function()
Out[6]: 43

For Python version 3.4 and above
import importlib
importlib.reload(<package_name>)
from <package_name> import <method_name>
Refer below documentation for details.

There are some really good answers here already, but it is worth knowing about dreload, which is a function available in IPython which does as "deep reload". From the documentation:
The IPython.lib.deepreload module allows you to recursively reload a
module: changes made to any of its dependencies will be reloaded
without having to exit. To start using it, do:
http://ipython.org/ipython-doc/dev/interactive/reference.html#dreload
It is available as a "global" in IPython notebook (at least my version, which is running v2.0).
HTH

You can use import hook machinery described in PEP 302 to load not modules themself but some kind of proxy object that will allow you to do anything you want with underlying module object — reload it, drop reference to it etc.
Additional benefit is that your currently existing code will not require change and this additional module functionality can be torn off from a single point in code — where you actually add finder into sys.meta_path.
Some thoughts on implementing: create finder that will agree to find any module, except of builtin (you have nothing to do with builtin modules), then create loader that will return proxy object subclassed from types.ModuleType instead of real module object. Note that loader object are not forced to create explicit references to loaded modules into sys.modules, but it's strongly encouraged, because, as you have already seen, it may fail unexpectably. Proxy object should catch and forward all __getattr__, __setattr__ and __delattr__ to underlying real module it's keeping reference to. You will probably don't need to define __getattribute__ because of you would not hide real module contents with your proxy methods. So, now you should communicate with proxy in some way — you can create some special method to drop underlying reference, then import module, extract reference from returned proxy, drop proxy and hold reference to reloaded module. Phew, looks scary, but should fix your problem without reloading Python each time.

I am using PythonNet in my project. Fortunately, I found there is a command which can perfectly solve this problem.
using (Py.GIL())
{
dynamic mod = Py.Import(this.moduleName);
if (mod == null)
throw new Exception( string.Format("Cannot find module {0}. Python script may not be complied successfully or module name is illegal.", this.moduleName));
// This command works perfect for me!
PythonEngine.ReloadModule(mod);
dynamic instance = mod.ClassName();

Think twice for quitting and restarting in production
The easy solution without quitting & restarting is by using the reload from imp
import moduleA, moduleB
from imp import reload
reload (moduleB)

How bad is it to explicitly add things to locals()?

I'm trying to dynamically load modules as explained here.
I have written a script that requires some modules that may not be installed by default on some systems (such as requests). The script code assumes that a regular import has been done (it uses requests.get).
If I use the code in the link above, to import requests I would have to use:
requests=importlib.import_module('requests')
But this leads to a lot of code duplication since I have several modules. I can't use that in a loop since the variable name must change with the imported module.
I have found that I can use:
for m in list_of_modules:
locals()[m]=importlib.import_module(m)
And everything happens as if I had done regular import's.
(of course the real code catches exceptions...).
So the question is how valid/risky this is? Too good to be true or not? (Python 2.7 if that makes a difference)

It is explicitely invalid. Doc of Python 2.7.15 says of locals() function:
The contents of this dictionary should not be modified; changes may not affect the values of local and free variables used by the interpreter.
locals() is a way for the program to know the list of variables in a function block. It is not a way to create local variables.
If you really need something like that, you can either use a local map, rely on the sys.modules map which is updated by import_module, or update the globals() map. Anyway, once a module was loaded, it exists (through the sys.module map) for the whole program, so it does not really make sense to store its reference in a local symbol table.
So if you really need to import a dynamically builded list of modules, I would do:
for m in list_of_modules:
globals()[m]=importlib.import_module(m)

How to share same variable between imported modules

There are two Python scripts: master.py and to_be_imported.py
Here is the master.py:
import os
os.foo = 12345
import to_be_imported
And here is the to_be_imported.py:
import os
if hasattr(os, 'foo'):
print 'os hasattr foo: %s'%os.foo
Now when I run master.py I get this:
os hasattr foo: 12345
indicating that the imported module to_be_imported.py picks up the variable declared inside the process that imported it (master.py).
While it works fine I would like to know why it works and also to make sure it is a safe practice.

If a module is already imported, subsequent imports to the module uses the cached version of the module. Even if you reference it via different names as in the following case
import os as a
import os as b
Both refer to the same os module that was imported the first time. So it is obvious that the variable assigned to a module will be shared.
You can verify it using the built-in python function id()

Nothing is a bad idea per se, but you must remember few things:
Modules are objects in Python. They are loaded only once and added to sys.modules. These objects can also be added attributes like regular objects (with no messy implementation of setattr).
Since they are objects, but not instantiable ones, you must consider them as singletons (they are singletons, after all), and you must consider the disadvantages and benefits of such model:
a. Singletons are only one object. Are you sure that accessing their attributes is concurrency-safe?
b. Modules are global objects. Are you sure you can track the whole behavior and access to their members? Are you sure you will be able to debug errors there?
Is the code something you will work with others?
While no idea is better than other, good practices tell us that using global variables is not well-seen, specially if we have a team to work with. On the other hand: if your code is concurrent and/or reentrant, avoid using global variables or relying on module attributes. OTOH you will have no problem assigning attributes like that. They will last for the life of your script execution.
This is not the place to chose the best alternative. Depending on how you state your problem, you can ask it either on programmers or codereview. You can chose many variants to share state without using global variables in modules, like passing those variables inside a state back and forth across arguments, or learning and using OOP. But, again, this site is no scope for that.

How to call methods of the instance of a library already in scope of the current running test case

I have a library that interfaces with an external tool and exposes some basic keywords to use from robotframework; This library is implemented as a python package, and I would like to implement extended functionality that implements complex logic, and exposes more keywords, within modules of this package. The package is given test case scope, but I'm not entirely sure how this works. If I suggest a few ways I have thought of, could someone with a bit more knowledge let me know where I'm on the right track, and where I'm barking up the wrong tree...
Use an instance variable - if the scope is such that the python interpreter will see the package as imported by the current test case (i.e this is treated as a separate package in different test cases rather than a separate instance of the same package), then on initialisation I could set a global variable INSTANCE to self and then from another module within the package, import INSTANCE and use it.
Use an instance dictionary - if the scope is such that all imports see the package as the same, I could use robot.running.context to set a dictionary key such that there is an item in the instance dictionary for each context where the package has been imported - this would then mean that I could use the same context variable as a lookup key in the modules that are based on this. (The disadvantage of this one is that it will prevent garbage collection until the package itself is out of scope, and relies on it being in scope persistently.)
A context variable that I am as of yet unaware of that will give me the instance that is in scope. The docs are fairly difficult to search, so it's fully possible that there is something that I'm missing that will make this trivial. Also just as good would be something that allowed me to call the keywords that are in scope.
Some excellent possibility I haven't considered....
So can anyone help?

Credit for this goes to Kevin O. from the robotframework user group, but essentially the magic lives in robot.libraries.BuiltIn.BuiltIn().get_library_instance(library_name) which can be used like this:
from robot.libraries.BuiltIn import BuiltIn
class SeleniumTestLibrary(object):
def element_should_be_really_visible(self):
s2l = BuiltIn().get_library_instance('Selenium2Library')
element = s2l._element_find(locator, True, False)

It sounds like you are talking about monkeypatching the imported code, so that other modules which import that package will also see your runtime modifications. (Correct me if I'm wrong; there are a couple of bits in your question that I'm not quite following)
For simple package imports, this should work:
import my_package
def method_override():
return "Foo"
my_package.some_method = method_override
my_package, in this case, refers to the imported module, and is not just a local name, so other modules will see the overridden method.
This won't work in cases where other code has already done
from my_package import some_method
Since in that case, some_method is a local name in the place it is imported. If you replace the method elsewhere, that change won't be seen.
If this is happening, then you either need to change the source to import the entire module, or patch a little bit deeper, by replacing method internals:
import my_package
def method_override():
return "Foo"
my_package.some_method.func_code = method_override.func_code
At that point, it doesn't matter how the method was imported in any other module; the code object associated with the method has been replaced, and your new code will run rather than the original.
The only thing to worry about in that case is that the module is imported from the same path in every case. The Python interpreter will try to reuse existing modules, rather than re-import and re-initialize them, whenever they are imported from the same path.
However, if your python path is set up to contain two directories, say: '/foo' and '/foo/bar', then these two imports
from foo.bar import baz
and
from bar import baz
would end up loading the module twice, and defining two versions of any objects (methods, classes, etc) in the module. If that happens, then patching one will not affect the other.
If you need to guard against that case, then you may have to traverse sys.modules, looking for the imported package, and patching each version that you find. This, of course, will only work if all of the other imports have already happened, you can't do that pre-emptively (without writing an import hook, but that's another level deeper again :) )
Are you sure you can't just fork the original package and extend it directly? That would be much easier :)

Is there anyway to clear python bytecode cache?

each unit test I'm running is writing python code out to a file, then importing it as a module. The problem is that the code changes but further import statements don't modify the module.
I think what I need is a way to ether force a reload on a module or clear the internal bytecode cache. Any ideas?
Thanks!

Reimporting modules is tricky to get all the edge cases right. The documentation for reload mentions some of them. Depending on what you are testing, you may be better off by testing the imports with separate invocations of the interpreter by running each via, say, subprocess. It will likely be slower but also likely safer and more accurate testing.

Use reload().
Reload a previously imported module.
The argument must be a module object,
so it must have been successfully
imported before. This is useful if you
have edited the module source file
using an external editor and want to
try out the new version without
leaving the Python interpreter. The
return value is the module object (the
same as the module argument).
However, the module needs to be already loaded. A workaround is to handle the resulting NameError:
try:
reload(math)
except NameError:
import math

Write your code to differently-named modules. Writing new code into an existing file, and trying to import it again will not work well.
Alternatively, you can clobber sys.modules. For example:
class MyTestCase(unittest.TestCase):
def setUp(self):
# Record sys.modules here so we can restore it in tearDown.
self.old_modules = dict(sys.modules)
def tearDown(self):
# Remove any new modules imported during the test run. This lets us
# import the same source files for more than one test.
for m in [m for m in sys.modules if m not in self.old_modules]:
del sys.modules[m]

Ran into a similar situation.
Later on found that the white space indentation technique used matters.
Especially on windows platforms, ensure that a uniform technique is adapted
throughout the module i.e., either use tab or spaces exclusively.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.