While developing a largeish project (split in several files and folders) in Python with IPython, I run into the trouble of cached imported modules.
The problem is that instructions import module only reads the module once, even if that module has changed! So each time I change something in my package, I have to quit and restart IPython. Painful.
Is there any way to properly force reloading some modules? Or, better, to somehow prevent Python from caching them?
I tried several approaches, but none works. In particular I run into really, really weird bugs, like some modules or variables mysteriously becoming equal to None...
The only sensible resource I found is Reloading Python modules, from pyunit, but I have not checked it. I would like something like that.
A good alternative would be for IPython to restart, or restart the Python interpreter somehow.
So, if you develop in Python, what solution have you found to this problem?
Edit
To make things clear: obviously, I understand that some old variables depending on the previous state of the module may stick around. That's fine by me. By why is that so difficult in Python to force reload a module without having all sort of strange errors happening?
More specifically, if I have my whole module in one file module.py then the following works fine:
import sys
try:
del sys.modules['module']
except AttributeError:
pass
import module
obj = module.my_class()
This piece of code works beautifully and I can develop without quitting IPython for months.
However, whenever my module is made of several submodules, hell breaks loose:
import os
for mod in ['module.submod1', 'module.submod2']:
try:
del sys.module[mod]
except AttributeError:
pass
# sometimes this works, sometimes not. WHY?
Why is that so different for Python whether I have my module in one big file or in several submodules? Why would that approach not work??
import checks to see if the module is in sys.modules, and if it is, it returns it. If you want import to load the module fresh from disk, you can delete the appropriate key in sys.modules first.
There is the reload builtin function which will, given a module object, reload it from disk and that will get placed in sys.modules. Edit -- actually, it will recompile the code from the file on the disk, and then re-evalute it in the existing module's __dict__. Something potentially very different than making a new module object.
Mike Graham is right though; getting reloading right if you have even a few live objects that reference the contents of the module you don't want anymore is hard. Existing objects will still reference the classes they were instantiated from is an obvious issue, but also all references created by means of from module import symbol will still point to whatever object from the old version of the module. Many subtly wrong things are possible.
Edit: I agree with the consensus that restarting the interpreter is by far the most reliable thing. But for debugging purposes, I guess you could try something like the following. I'm certain that there are corner cases for which this wouldn't work, but if you aren't doing anything too crazy (otherwise) with module loading in your package, it might be useful.
def reload_package(root_module):
package_name = root_module.__name__
# get a reference to each loaded module
loaded_package_modules = dict([
(key, value) for key, value in sys.modules.items()
if key.startswith(package_name) and isinstance(value, types.ModuleType)])
# delete references to these loaded modules from sys.modules
for key in loaded_package_modules:
del sys.modules[key]
# load each of the modules again;
# make old modules share state with new modules
for key in loaded_package_modules:
print 'loading %s' % key
newmodule = __import__(key)
oldmodule = loaded_package_modules[key]
oldmodule.__dict__.clear()
oldmodule.__dict__.update(newmodule.__dict__)
Which I very briefly tested like so:
import email, email.mime, email.mime.application
reload_package(email)
printing:
reloading email.iterators
reloading email.mime
reloading email.quoprimime
reloading email.encoders
reloading email.errors
reloading email
reloading email.charset
reloading email.mime.application
reloading email._parseaddr
reloading email.utils
reloading email.mime.base
reloading email.message
reloading email.mime.nonmultipart
reloading email.base64mime
Quitting and restarting the interpreter is the best solution. Any sort of live reloading or no-caching strategy will not work seamlessly because objects from no-longer-existing modules can exist and because modules sometimes store state and because even if your use case really does allow hot reloading it's too complicated to think about to be worth it.
With IPython comes the autoreload extension that automatically repeats an import before each function call. It works at least in simple cases, but don't rely too much on it: in my experience, an interpreter restart is still required from time to time, especially when code changes occur only on indirectly imported code.
Usage example from the linked page:
In [1]: %load_ext autoreload
In [2]: %autoreload 2
In [3]: from foo import some_function
In [4]: some_function()
Out[4]: 42
In [5]: # open foo.py in an editor and change some_function to return 43
In [6]: some_function()
Out[6]: 43
For Python version 3.4 and above
import importlib
importlib.reload(<package_name>)
from <package_name> import <method_name>
Refer below documentation for details.
There are some really good answers here already, but it is worth knowing about dreload, which is a function available in IPython which does as "deep reload". From the documentation:
The IPython.lib.deepreload module allows you to recursively reload a
module: changes made to any of its dependencies will be reloaded
without having to exit. To start using it, do:
http://ipython.org/ipython-doc/dev/interactive/reference.html#dreload
It is available as a "global" in IPython notebook (at least my version, which is running v2.0).
HTH
You can use import hook machinery described in PEP 302 to load not modules themself but some kind of proxy object that will allow you to do anything you want with underlying module object — reload it, drop reference to it etc.
Additional benefit is that your currently existing code will not require change and this additional module functionality can be torn off from a single point in code — where you actually add finder into sys.meta_path.
Some thoughts on implementing: create finder that will agree to find any module, except of builtin (you have nothing to do with builtin modules), then create loader that will return proxy object subclassed from types.ModuleType instead of real module object. Note that loader object are not forced to create explicit references to loaded modules into sys.modules, but it's strongly encouraged, because, as you have already seen, it may fail unexpectably. Proxy object should catch and forward all __getattr__, __setattr__ and __delattr__ to underlying real module it's keeping reference to. You will probably don't need to define __getattribute__ because of you would not hide real module contents with your proxy methods. So, now you should communicate with proxy in some way — you can create some special method to drop underlying reference, then import module, extract reference from returned proxy, drop proxy and hold reference to reloaded module. Phew, looks scary, but should fix your problem without reloading Python each time.
I am using PythonNet in my project. Fortunately, I found there is a command which can perfectly solve this problem.
using (Py.GIL())
{
dynamic mod = Py.Import(this.moduleName);
if (mod == null)
throw new Exception( string.Format("Cannot find module {0}. Python script may not be complied successfully or module name is illegal.", this.moduleName));
// This command works perfect for me!
PythonEngine.ReloadModule(mod);
dynamic instance = mod.ClassName();
Think twice for quitting and restarting in production
The easy solution without quitting & restarting is by using the reload from imp
import moduleA, moduleB
from imp import reload
reload (moduleB)
Related
Suppose in main.py I have
from tools import myconstant
and in tools.py I have:
myconstant = 15
Now I add something to tools.py:
myconstant = 15
myotherconstant = 25
Save the file and in main.py I run (still from the same python session):
from tools import myotherconstant
which throws an error. If I restart my python session it works. Why is that? I know that usually, one would run the main file from a fresh session and never have this 'problem' but I am still curious as to why python cannot detect changes in files. If I change the contents of a CSV file and read it in again it would reflect the updated data. What is the difference here?
There is no problem to solve here, I am just asking for clarification of python concepts :) Some minimal examples that demonstrate the inner workings would be appreciated.
Modules are only checked at load time which is a good thing, You can use reload()
import importlib
importlib.reload(tools.py)
A restart is not specifically required, you just need to reload the module
Python will cache the modules that are imported
from the docs
The first place checked during import search is sys.modules. This
mapping serves as a cache of all modules that have been previously
imported, including the intermediate paths.
You can delete the module from the cache by deleting the dictionary key from sys.modules and do the import again. Although, this is not the recommended approach (as other modules may hold references to it).
You should use importlib.reload
From the docs
Reload a previously imported module. The argument must be a module
object, so it must have been successfully imported before. This is
useful if you have edited the module source file using an external
editor and want to try out the new version without leaving the Python
interpreter.
Example,
import xyz # reload this
import importlib
importlib.reload(xyz)
Or
from xyz import pqr
import importlib
import sys
importlib.reload(sys.modules.get('xyz')
In your example, you need to do
import importlib
import sys
importlib.reload(sys.modules.get('tools'))
Doing a restart will clear the cache too, but it is not required.
First of all, I must tell you that I have already looked this this bug
and I understand that the feature is (in general) not possible for a long time. However, I have a use case which is very specific. Hence, I will try to describe my use case, and ask for suggestions.
I am writing an interactive python application which is run from the interpreter. So, the user might make a mistake or two when importing modules, and I would like it, if I could provide a method for the user to delete the module (as soon as he has imported it).
So, one of the problems with the references to the module being already incorporated into other objects is gone. Once, I am sure that the module has not been used at all, what can I do to remove it? Is it still technically possible?
I was thinking that if I could create a function which manually deletes every object/function created by the module when imported, then what I want might be accomplished. Is this true?
IPython does a similar operation with its extensions. Is that the correct way to go?
Modules are just a namespace, an object, stored in the sys.modules mapping.
If there are no other references to anything belonging in that module, you can remove it from the sys.modules mapping to delete it again:
>>> import timeit
>>> import sys
>>> del timeit # remove local reference to the timeit module
>>> del sys.modules['timeit'] # remove module
In the program I am writing, I created a module called settings that declares a few constants but loads other from a configuration file, placing them in the module namespace (for example: the value of π might be in the code of the module, but the weight of the user in a configuration file).
This is such that in other modules, I can do:
from settings import *
Everything works fine for me but - using Aptana Studio / PyDev, the code analysis tool throws a lot of undefined variable errors like this:
I found here that there is a flag usable to prevent this behaviour in class docstrings, but it has no effect if I try to use it at module level. So I wonder two things:
Is there a way to selectively get rid of these errors (meaning that I wouldn't want to completely turn off the option "mark as errors the undefined variables": in other modules it could in fact be an error)?
If not, is there an alternative pattern to achieve what I want in terms of wild imports, but without confusing the code analysis tool?
Pre-emptive note: I am perfectly aware of the fact wild imports are discouraged.
Actually you'd probably have the same error even if it wasn't a wild import (i.e.: import settings / settings.MY_VARIABLE would still show an error because the code-analysis can't find it).
Aside from the #UndefinedVariable in each place that references it (CTRL+1 will show that option), I think that a better pattern for your module would be:
MY_VARIABLE = 'default value'
...
update_default_values() # Go on and override the defaults.
That way, the code-analysis (and anyone reading your module), would know which variables are expected.
Otherwise, if you don't know them before, I think a better approach would be having a method (i.e.: get_settings('MY_VARIABLE')).
Unrelated to the actual problem. I'd really advise against using a wild import here (nor even importing the constant... i.e.: from settings import MY_VARIABLE).
A better approach for a settings module is always using:
import settings
settings.MY_VARIABLE
(because otherwise, if any place decides it wants to change the MY_VARIABLE, any place that has put the reference in its own namespace will probably never get the changed variable).
An even safer approach would be having a method get_setting('var'), as it would allow you to a better lazy-loading of your preferences (i.e.: don't load on import, but when it's called the 1st time).
You can use Ctrl-1 on an error and choose #UndefinedVariable or type ##UndefinedVariable on a line that has an error you want to ignore.
You can try to add your module to be scanned by the PyDev interpreter by going to Window > Preferences, then PyDev > Interpreter - Python. Under the Libraries tab, click New Folder and browse to the directory that contains settings, then click Apply. Hopefully, Pydev will find your package and recognize the wildly-imported variables.
How does one get (finds the location of) the dynamically imported modules from a python script ?
so, python from my understanding can dynamically (at run time) load modules.
Be it using _import_(module_name), or using the exec "from x import y", either using imp.find_module("module_name") and then imp.load_module(param1, param2, param3, param4) .
Knowing that I want to get all the dependencies for a python file. This would include getting (or at least I tried to) the dynamically loaded modules, those loaded either by using hard coded string objects or those returned by a function/method.
For normal import module_name and from x import y you can do either a manual scanning of the code or use module_finder.
So if I want to copy one python script and all its dependencies (including the custom dynamically loaded modules) how should I do that ?
You can't; the very nature of programming (in any language) means that you cannot predict what code will be executed without actually executing it. So you have no way of telling which modules could be included.
This is further confused by user-input, consider: __import__(sys.argv[1]).
There's a lot of theoretical information about the first problem, which is normally described as the Halting problem, the second just obviously can't be done.
From a theoretical perspective, you can never know exactly what/where modules are being imported. From a practical perspective, if you simply want to know where the modules are, check the module.__file__ attribute or run the script under python -v to find files when modules are loaded. This won't give you every module that could possibly be loaded, but will get most modules with mostly sane code.
See also: How do I find the location of Python module sources?
This is not possible to do 100% accurately. I answered a similar question here: Dependency Testing with Python
Just an idea and I'm not sure that it will work:
You could write a module that contains a wrapper for __builtin__.__import__. This wrapper would save a reference to the old __import__and then assign a function to __builtin__.__import__ that does the following:
whenever called, get the current stacktrace and work out the calling function. Maybe the information in the globals parameter to __import__ is enough.
get the module of that calling functions and store the name of this module and what will get imported
redirect the call the real __import__
After you have done this you can call your application with python -m magic_module yourapp.py. The magic module must store the information somewhere where you can retrieve it later.
That's quite of a question.
Static analysis is about predicting all possible run-time execution paths and making sure the program halts for specific input at all.
Which is equivalent to Halting Problem and unfortunately there is no generic solution.
The only way to resolve dynamic dependencies is to run the code.
each unit test I'm running is writing python code out to a file, then importing it as a module. The problem is that the code changes but further import statements don't modify the module.
I think what I need is a way to ether force a reload on a module or clear the internal bytecode cache. Any ideas?
Thanks!
Reimporting modules is tricky to get all the edge cases right. The documentation for reload mentions some of them. Depending on what you are testing, you may be better off by testing the imports with separate invocations of the interpreter by running each via, say, subprocess. It will likely be slower but also likely safer and more accurate testing.
Use reload().
Reload a previously imported module.
The argument must be a module object,
so it must have been successfully
imported before. This is useful if you
have edited the module source file
using an external editor and want to
try out the new version without
leaving the Python interpreter. The
return value is the module object (the
same as the module argument).
However, the module needs to be already loaded. A workaround is to handle the resulting NameError:
try:
reload(math)
except NameError:
import math
Write your code to differently-named modules. Writing new code into an existing file, and trying to import it again will not work well.
Alternatively, you can clobber sys.modules. For example:
class MyTestCase(unittest.TestCase):
def setUp(self):
# Record sys.modules here so we can restore it in tearDown.
self.old_modules = dict(sys.modules)
def tearDown(self):
# Remove any new modules imported during the test run. This lets us
# import the same source files for more than one test.
for m in [m for m in sys.modules if m not in self.old_modules]:
del sys.modules[m]
Ran into a similar situation.
Later on found that the white space indentation technique used matters.
Especially on windows platforms, ensure that a uniform technique is adapted
throughout the module i.e., either use tab or spaces exclusively.