Is __main__ guaranteed to always be importable? - python

Is there any case where doing:
import __main__
might lead to an ImportError? All cases I've tried seem to indicate that this always works. The docs on __main__ don't seems to state anything on the matter.
To give some context: I am trying to inject some names in __main__.__dict__ using the usersitecustomize hook in order to (mainly) have them available when the REPL fires up.
Granted that no redefinitions of __import__ occur (as a comment stated), this essentially boils down to if I need to wrap it in a try-except or not.

It probably is. Python initializes __main__ in this file:
https://github.com/python/cpython/blob/master/Python/pylifecycle.c#L1327
However please note that modules like runpy and IPython replace the __main__ module with their own dynamically created ones to prevent collisions with their own launch scripts and to provide expected behaviour in case of runpy.
runpy itself is part of the Python Standard Library and provides the implementation of the -m flag which allows arbitrary modules to be executed as script.
An alternative is IPython which offers the feature to execute code at the launch of a new REPL.
For more details, see here: http://ipython.readthedocs.io/en/stable/config/intro.html?highlight=exec_lines

Related

Importing variable from another file changes it [duplicate]

While developing a largeish project (split in several files and folders) in Python with IPython, I run into the trouble of cached imported modules.
The problem is that instructions import module only reads the module once, even if that module has changed! So each time I change something in my package, I have to quit and restart IPython. Painful.
Is there any way to properly force reloading some modules? Or, better, to somehow prevent Python from caching them?
I tried several approaches, but none works. In particular I run into really, really weird bugs, like some modules or variables mysteriously becoming equal to None...
The only sensible resource I found is Reloading Python modules, from pyunit, but I have not checked it. I would like something like that.
A good alternative would be for IPython to restart, or restart the Python interpreter somehow.
So, if you develop in Python, what solution have you found to this problem?
Edit
To make things clear: obviously, I understand that some old variables depending on the previous state of the module may stick around. That's fine by me. By why is that so difficult in Python to force reload a module without having all sort of strange errors happening?
More specifically, if I have my whole module in one file module.py then the following works fine:
import sys
try:
del sys.modules['module']
except AttributeError:
pass
import module
obj = module.my_class()
This piece of code works beautifully and I can develop without quitting IPython for months.
However, whenever my module is made of several submodules, hell breaks loose:
import os
for mod in ['module.submod1', 'module.submod2']:
try:
del sys.module[mod]
except AttributeError:
pass
# sometimes this works, sometimes not. WHY?
Why is that so different for Python whether I have my module in one big file or in several submodules? Why would that approach not work??
import checks to see if the module is in sys.modules, and if it is, it returns it. If you want import to load the module fresh from disk, you can delete the appropriate key in sys.modules first.
There is the reload builtin function which will, given a module object, reload it from disk and that will get placed in sys.modules. Edit -- actually, it will recompile the code from the file on the disk, and then re-evalute it in the existing module's __dict__. Something potentially very different than making a new module object.
Mike Graham is right though; getting reloading right if you have even a few live objects that reference the contents of the module you don't want anymore is hard. Existing objects will still reference the classes they were instantiated from is an obvious issue, but also all references created by means of from module import symbol will still point to whatever object from the old version of the module. Many subtly wrong things are possible.
Edit: I agree with the consensus that restarting the interpreter is by far the most reliable thing. But for debugging purposes, I guess you could try something like the following. I'm certain that there are corner cases for which this wouldn't work, but if you aren't doing anything too crazy (otherwise) with module loading in your package, it might be useful.
def reload_package(root_module):
package_name = root_module.__name__
# get a reference to each loaded module
loaded_package_modules = dict([
(key, value) for key, value in sys.modules.items()
if key.startswith(package_name) and isinstance(value, types.ModuleType)])
# delete references to these loaded modules from sys.modules
for key in loaded_package_modules:
del sys.modules[key]
# load each of the modules again;
# make old modules share state with new modules
for key in loaded_package_modules:
print 'loading %s' % key
newmodule = __import__(key)
oldmodule = loaded_package_modules[key]
oldmodule.__dict__.clear()
oldmodule.__dict__.update(newmodule.__dict__)
Which I very briefly tested like so:
import email, email.mime, email.mime.application
reload_package(email)
printing:
reloading email.iterators
reloading email.mime
reloading email.quoprimime
reloading email.encoders
reloading email.errors
reloading email
reloading email.charset
reloading email.mime.application
reloading email._parseaddr
reloading email.utils
reloading email.mime.base
reloading email.message
reloading email.mime.nonmultipart
reloading email.base64mime
Quitting and restarting the interpreter is the best solution. Any sort of live reloading or no-caching strategy will not work seamlessly because objects from no-longer-existing modules can exist and because modules sometimes store state and because even if your use case really does allow hot reloading it's too complicated to think about to be worth it.
With IPython comes the autoreload extension that automatically repeats an import before each function call. It works at least in simple cases, but don't rely too much on it: in my experience, an interpreter restart is still required from time to time, especially when code changes occur only on indirectly imported code.
Usage example from the linked page:
In [1]: %load_ext autoreload
In [2]: %autoreload 2
In [3]: from foo import some_function
In [4]: some_function()
Out[4]: 42
In [5]: # open foo.py in an editor and change some_function to return 43
In [6]: some_function()
Out[6]: 43
For Python version 3.4 and above
import importlib
importlib.reload(<package_name>)
from <package_name> import <method_name>
Refer below documentation for details.
There are some really good answers here already, but it is worth knowing about dreload, which is a function available in IPython which does as "deep reload". From the documentation:
The IPython.lib.deepreload module allows you to recursively reload a
module: changes made to any of its dependencies will be reloaded
without having to exit. To start using it, do:
http://ipython.org/ipython-doc/dev/interactive/reference.html#dreload
It is available as a "global" in IPython notebook (at least my version, which is running v2.0).
HTH
You can use import hook machinery described in PEP 302 to load not modules themself but some kind of proxy object that will allow you to do anything you want with underlying module object — reload it, drop reference to it etc.
Additional benefit is that your currently existing code will not require change and this additional module functionality can be torn off from a single point in code — where you actually add finder into sys.meta_path.
Some thoughts on implementing: create finder that will agree to find any module, except of builtin (you have nothing to do with builtin modules), then create loader that will return proxy object subclassed from types.ModuleType instead of real module object. Note that loader object are not forced to create explicit references to loaded modules into sys.modules, but it's strongly encouraged, because, as you have already seen, it may fail unexpectably. Proxy object should catch and forward all __getattr__, __setattr__ and __delattr__ to underlying real module it's keeping reference to. You will probably don't need to define __getattribute__ because of you would not hide real module contents with your proxy methods. So, now you should communicate with proxy in some way — you can create some special method to drop underlying reference, then import module, extract reference from returned proxy, drop proxy and hold reference to reloaded module. Phew, looks scary, but should fix your problem without reloading Python each time.
I am using PythonNet in my project. Fortunately, I found there is a command which can perfectly solve this problem.
using (Py.GIL())
{
dynamic mod = Py.Import(this.moduleName);
if (mod == null)
throw new Exception( string.Format("Cannot find module {0}. Python script may not be complied successfully or module name is illegal.", this.moduleName));
// This command works perfect for me!
PythonEngine.ReloadModule(mod);
dynamic instance = mod.ClassName();
Think twice for quitting and restarting in production
The easy solution without quitting & restarting is by using the reload from imp
import moduleA, moduleB
from imp import reload
reload (moduleB)

Recovering original argv

When a script is invoked explicitly with python, the argv is mucked with so that argv[0] is the path to the script being run. This is the case if invoked as python foo/bar.py or even as python -m foo.bar.
I need a way to recover the original argv (ie. the one received by python). Unfortunately, it's not as easy as prepending sys.executable to sys.argv because python foo/bar.py is different than python -m foo.bar (the implicit PYTHONPATH differs, which can be crucial depending on your module structure).
More specifically in the cases of python foo/bar.py some other args and python -m foo.bar some other args, I'm looking to recover ['python', 'foo/bar.py', 'some', 'other', 'args'] and ['python', '-m', 'foo.bar', 'some', 'other', 'args'], respectively.
I am aware of prior questions about this:
how to get the ORIGINAL command line in python? with spaces, tabs, etc
Full command line as it was typed
But these seem to have a misunderstanding of how shells work and the answers reflect this. I am not interested in undoing the work of the shell (eg. evaluated shell vars and functions are fine), I just want to get at the original argv given to python.
The only solution I've found is to use /proc/<PID>/cmdline:
import os
with open("/proc/{}/cmdline".format(os.getpid()), 'rb') as f:
original_argv = f.read().split('\0')[:-1]
This does work, but it is Linux-only (no OSX, and Windows support seems to require installing the wmi package). Fortunately for my current use case this restriction is fine. But, it would be nice to have a cleaner, cross platform approach.
The fact that that /proc/<PID>/cmdline approach works gives me hope that python isn't execing before it runs the script (at least not the syscall exec, but maybe the exec builtin). I remember reading somewhere that all of this argument handling (ex. -m) is done in pure python, not C (this is confirmed by the fact that python -m this.does.not.exist will produce an exception that looks like it came from the runtime). So, I'd venture a guess that somewhere in pure python the original argv is available (perhaps this requires some spelunking through the runtime initialization?).
tl;dr Is there a cross platform (builtin, preferably) way to get at the original argv passed to python (before it remove the python executable and transforms -m blah into blah.py)?
edit From spelunking, I discovered Py_GetArgcArgv, which can be accessed via ctypes (found it here, links to several SO posts that mention this approach):
import ctypes
_argv = ctypes.POINTER(ctypes.c_wchar_p)()
_argc = ctypes.c_int()
ctypes.pythonapi.Py_GetArgcArgv(ctypes.byref(_argc),
ctypes.byref(_argv))
argv = _argv[:_argc.value]
print(argv)
Now this is OS-portable, but not python implementation portable (only works on cpython and ctypes is yucky if you don't need it). Also, peculiarly, I don't get the right output on Ubunutu 16.04 (python -m foo.bar gives me ['python', '-m', '-m']), but I may just be making a silly mistake (I get the same behavior on OSX). It would be great to have a fully portable solution (that doesn't dig into ctypes).
Python 3.10 adds sys.orig_argv, which the docs describe as the arguments originally passed to the Python executable. If this isn't exactly what you're looking for, it may be helpful in this or similar cases.
There were a bunch of possibilities considered, including changing sys.argv, but this was, I think, wisely chosen as the most effective and non-disruptive option.
This seems XY problem and you are getting into the weeds in order to accommodate some existing complicated test setup (I've found the question behind the question in your comment). Further efforts would be better spent writing a sane test setup.
Use a better test runner, not unittest.
Create any initial state within the test setup, not in the external environment before entering the Python runtime.
Use a plugin for the randomization and seed stuff, personally I use this one but there are others.
For example if you decide to go with pytest runner, all the test setup can be configured within a [tool.pytest.ini_options] section of the pyproject.toml file and/or with a fixture defined in conftest.py. Overriding the default test configuration can be done with environment variables and/or command line arguments, and neither of these approaches will get mucked around by the shell or during Python interpreter startup.
The manner in which to execute the test suite can and should be as simple as executing a single command:
pytest
And then your perceived problem of needing to recover the original sys.argv will go away.
Your stated problem is:
User called my app with environment variables and arguments.
I want to display a "run like this" diagnostic that will exactly reproduce the results of the current run.
There are at least two solutions:
Abandon the "reproduction" aspect, since the original bash calling command is lost to the portable python app, and instead go for "same effect".
Use a wrapper to capture the original calling command, as suggested by Jean-François Fabre.
With (1) you would be willing to accept ['-m', 'foo'] becoming ['foo.py'], or even turning it into ['/some/dir/foo.py'] in case PYTHONPATH could cause trouble. Displaying ['a', 'b c'] as "a" "b c", or more concisely as a "b c", is straightforward. If environment variables like SEED are an important part of the command line interface then you'll need to iterate over envp and output them, as well. For true reproducibility, you might choose to convert input args to canonical form, compare with observed input args, and exec using the canonical form if they're not identical, so there's no way to execute the bulk of your code using "odd" syntax.
With (2) you would bury the app in some inconveniently named file, advertise the wrapper program far and wide, and enjoy the benefits of seeing args before they're munged.

how to make a python function as built-in function or the alike?

I want to test whether modules exists in python or not? but there seems no direct solution, so I wrote a function as below:
vim
I hope everytime I open the python interactive interface, I can simply type
test_module(module_name)
and thus check whether a module is existent or not.
so how can I make this function as something like built-in function so as to reach my target?
thanks!
You can add it to the __builtin__ module:
import __builtin__
def test_module(module_name):
# do something here
__builtin__.test_module = test_module
In Python 3, the module is called builtins instead.
If you want this to be run every time you open your Python interpreter, you can create a usercustomize.py module in USER_SITE location; it'll be imported everytime you run Python.
Be careful with expanding the built-ins, however. Adding names there makes them accessible globally by all Python code, and if any such code has accidentally or deliberately uses test_module where a NameError should have been raised, now uses your custom function.
It's much better to put such things into a dedicated module and only import this when you actually need that function. Explicit is better than implicit.

how do you statically find dynamically loaded modules

How does one get (finds the location of) the dynamically imported modules from a python script ?
so, python from my understanding can dynamically (at run time) load modules.
Be it using _import_(module_name), or using the exec "from x import y", either using imp.find_module("module_name") and then imp.load_module(param1, param2, param3, param4) .
Knowing that I want to get all the dependencies for a python file. This would include getting (or at least I tried to) the dynamically loaded modules, those loaded either by using hard coded string objects or those returned by a function/method.
For normal import module_name and from x import y you can do either a manual scanning of the code or use module_finder.
So if I want to copy one python script and all its dependencies (including the custom dynamically loaded modules) how should I do that ?
You can't; the very nature of programming (in any language) means that you cannot predict what code will be executed without actually executing it. So you have no way of telling which modules could be included.
This is further confused by user-input, consider: __import__(sys.argv[1]).
There's a lot of theoretical information about the first problem, which is normally described as the Halting problem, the second just obviously can't be done.
From a theoretical perspective, you can never know exactly what/where modules are being imported. From a practical perspective, if you simply want to know where the modules are, check the module.__file__ attribute or run the script under python -v to find files when modules are loaded. This won't give you every module that could possibly be loaded, but will get most modules with mostly sane code.
See also: How do I find the location of Python module sources?
This is not possible to do 100% accurately. I answered a similar question here: Dependency Testing with Python
Just an idea and I'm not sure that it will work:
You could write a module that contains a wrapper for __builtin__.__import__. This wrapper would save a reference to the old __import__and then assign a function to __builtin__.__import__ that does the following:
whenever called, get the current stacktrace and work out the calling function. Maybe the information in the globals parameter to __import__ is enough.
get the module of that calling functions and store the name of this module and what will get imported
redirect the call the real __import__
After you have done this you can call your application with python -m magic_module yourapp.py. The magic module must store the information somewhere where you can retrieve it later.
That's quite of a question.
Static analysis is about predicting all possible run-time execution paths and making sure the program halts for specific input at all.
Which is equivalent to Halting Problem and unfortunately there is no generic solution.
The only way to resolve dynamic dependencies is to run the code.

Is there anyway to clear python bytecode cache?

each unit test I'm running is writing python code out to a file, then importing it as a module. The problem is that the code changes but further import statements don't modify the module.
I think what I need is a way to ether force a reload on a module or clear the internal bytecode cache. Any ideas?
Thanks!
Reimporting modules is tricky to get all the edge cases right. The documentation for reload mentions some of them. Depending on what you are testing, you may be better off by testing the imports with separate invocations of the interpreter by running each via, say, subprocess. It will likely be slower but also likely safer and more accurate testing.
Use reload().
Reload a previously imported module.
The argument must be a module object,
so it must have been successfully
imported before. This is useful if you
have edited the module source file
using an external editor and want to
try out the new version without
leaving the Python interpreter. The
return value is the module object (the
same as the module argument).
However, the module needs to be already loaded. A workaround is to handle the resulting NameError:
try:
reload(math)
except NameError:
import math
Write your code to differently-named modules. Writing new code into an existing file, and trying to import it again will not work well.
Alternatively, you can clobber sys.modules. For example:
class MyTestCase(unittest.TestCase):
def setUp(self):
# Record sys.modules here so we can restore it in tearDown.
self.old_modules = dict(sys.modules)
def tearDown(self):
# Remove any new modules imported during the test run. This lets us
# import the same source files for more than one test.
for m in [m for m in sys.modules if m not in self.old_modules]:
del sys.modules[m]
Ran into a similar situation.
Later on found that the white space indentation technique used matters.
Especially on windows platforms, ensure that a uniform technique is adapted
throughout the module i.e., either use tab or spaces exclusively.

Categories