Show whether a Python module is loaded from bytecode - python

I'm trying to debug Hy's use of bytecode. In particular, each time a module is imported, I want to see the path it was actually imported from, whether source or bytecode. Under the hood, Hy manages modules with importlib. It doesn't explicitly read or write bytecode; that's taken care of by importlib.machinery.SourceFileLoader. So it looks like what I want to do is monkey-patch Python's importing system to print the import path each time an import happens. How can I do that? I should be able to figure out how to do it for Hy once I understand how to do it for Python.

The easiest way that does not involve coding, is to start Python with two(!) verbose flags:
python -vv myscript.py
you'll get a lot of output, including all the import statements and all the files Python tries to access in order to load the module. In this example I have a simple python script that does import json:
lots of output!
[...]
# trying /tmp/json.cpython-310-x86_64-linux-gnu.so
# trying /tmp/json.abi3.so
# trying /tmp/json.so
# trying /tmp/json.py
# trying /tmp/json.pyc
# /usr/lib/python3.10/json/__pycache__/__init__.cpython-310.pyc matches /usr/lib/python3.10/json/__init__.py
# code object from '/usr/lib/python3.10/json/__pycache__/__init__.cpython-310.pyc'
[...]
Alternatively but more complex: you could change the import statement itself. For that, you can overwrite __import__, which is invoked by the import statement itself. This way you could print out all the files import actually opens.

Seems like a good option would be to dynamically patch importlib.machinery.SourceFileLoader(fullname, path) and importlib.machinery.SourcelessFileLoader(fullname, path) to each print or write to a variable (a) the calling method and (b) the argument passed to the function.
If all you need to do is:
I want to see the path it was actually imported from, whether source or bytecode
And you don't need the import to "work properly", perhaps you can do a modified version of something like this. For example, I quickly modified their sample code to get this, I have not tested it so it may not work exactly, but it should get you on the right track:
# custom class to be the mock return value
class MockSourceLoader:
# mock SourceFileLoader method always returns that the module was loaded from source and its path
def SourceFileLoader(fullname, path):
return {"load type": "source", "fullname": fullname, "path": path}
def check_how_imported(monkeypatch):
# Any arguments may be passed and mock_get() will always return our
# mocked object
def mock_get(*args, **kwargs):
return MockSourceLoader
# apply the monkeypatch
monkeypatch.setattr(importlib.machinery, SourceFileLoader, SourceFileLoader)
You would of course provide a similar mock for Sourceless file loading for SourcelessFileLoader
For reference:
https://docs.python.org/3/library/importlib.html#:~:text=importlib.machinery.SourceFileLoader(fullname%2C%20path)%C2%B6
https://docs.python.org/3/library/importlib.html#:~:text=importlib.machinery.SourcelessFileLoader(fullname%2C%20path)

Related

Cannot import a function from a module

Basically I have 3 modules that all communicate with eachother and import eachother's functions. I'm trying to import a function from my shigui.py module that creates a gui for the program. Now I have a function that gets the values of user entries in the gui and I want to pass them to the other module. I'm trying to pass the function below:
def valueget():
keywords = kw.get()
delay = dlay.get()
category = catg.get()
All imports go fine, up until I try to import this function with
from shigui import valueget to another module that would use the values. In fact, I can't import any function to any module from this file. Also I should add that they are in the same directory. I'm appreciative of any help on this matter.
Well, I am not entirely sure of what imports what, but here is what I can tell you. Python can sometimes allow for circular dependencies. However, it depends on what the layout of your dependencies is. First and foremost, I would say see if there is any way you can avoid this happening (restructuring your code, etc.). If it is unavoidable then there is one thing you can try. When Python imports modules, it does so in order of code execution. This means that if you have a definition before an import, you can sometimes access the definition in the first module by importing that first module in the second module. Let me give an example. Consider you have two modules, A and B.
A:
def someFunc():
# use B's functionality from before B's import of A
pass
import B
B:
def otherFunc():
# use A's functionality from before A's import of B
pass
import A
In a situation like that, Python will allow this. However, everything after the imports is not always fair game so be careful. You can read up on Python's module system more if you want to know why this works.
Helpful, but not complete link: https://docs.python.org/3/tutorial/modules.html

executing python code from string loaded into a module

I found the following code snippet that I can't seem to make work for my scenario (or any scenario at all):
def load(code):
# Delete all local variables
globals()['code'] = code
del locals()['code']
# Run the code
exec(globals()['code'])
# Delete any global variables we've added
del globals()['load']
del globals()['code']
# Copy k so we can use it
if 'k' in locals():
globals()['k'] = locals()['k']
del locals()['k']
# Copy the rest of the variables
for k in locals().keys():
globals()[k] = locals()[k]
I created a file called "dynamic_module" and put this code in it, which I then used to try to execute the following code which is a placeholder for some dynamically created string I would like to execute.
import random
import datetime
class MyClass(object):
def main(self, a, b):
r = random.Random(datetime.datetime.now().microsecond)
a = r.randint(a, b)
return a
Then I tried executing the following:
import dynamic_module
dynamic_module.load(code_string)
return_value = dynamic_module.MyClass().main(1,100)
When this runs it should return a random number between 1 and 100. However, I can't seem to get the initial snippet I found to work for even the simplest of code strings. I think part of my confusion in doing this is that I may misunderstand how globals and locals work and therefore how to properly fix the problems I'm encountering. I need the code string to use its own imports and variables and not have access to the ones where it is being run from, which is the reason I am going through this somewhat over-complicated method.
You should not be using the code you found. It is has several big problems, not least that most of it doesn't actually do anything (locals() is a proxy, deleting from it has no effect on the actual locals, it puts any code you execute in the same shared globals, etc.)
Use the accepted answer in that post instead; recast as a function that becomes:
import sys, imp
def load_module_from_string(code, name='dynamic_module')
module = imp.new_module(name)
exec(code, mymodule.__dict__)
return module
then just use that:
dynamic_module = load_module_from_string(code_string)
return_value = dynamic_module.MyClass().main(1, 100)
The function produces a new, clean module object.
In general, this is not how you should dynamically import and use external modules. You should be using __import__ within your function to do this. Here's a simple example that worked for me:
plt = __import__('matplotlib.pyplot', fromlist = ['plt'])
plt.plot(np.arange(5), np.arange(5))
plt.show()
I imagine that for your specific application (loading from code string) it would be much easier to save the dynamically generated code string to a file (in a folder containing an __init__.py file) and then to call it using __import__. Then you could access all variables and functions of the code as parts of the imported module.
Unless I'm missing something?

Hacking python's import statement

I'm building a Python module for a fairly specific purpose. What I'd like to do with this is get more functionality behind importing things from it.
I'd like to have a setup by which saying from my_module import foo would run a function and pass the string "foo". This function would return the object that should be imported.
For example, maybe I want to make a cloud-based import system. I'd like to store community scripts in the cloud, and then download them when a user tries to import them.
Maybe I use the code from cloud import test_module. This would check a cache to decide whether test_module had been downloaded. If so, it would return that module. If not, it would download the module before importing it.
How can I accomplish something like this in Python, by which a dynamic range of submodules could be seamlessly imported from the cloud?
Full featured support for what you ask probably requires a bunch of complicated code using importlib and hooking into various parts of the import machinery. However, a more limited solution can be implemented with just a single custom class that pretends to be a module.
When you import a module, Python first checks in the sys.modules dictionary to see if the module is a key. If so, it returns the value associated with the key. It does this regardless of what the value is, so you can put any kind of object in sys.modules and Python will treat it like a module. A module's code can even replace its own entry in sys.modules, and the replacement will be used even the first time it is imported!
So, to implement your fancy module that downloads other modules on demand, replace the module itself with an instance of a custom class, and write that class a __getattr__ or __getattribute__ method that does the work you want.
Here's a trivial example module that returns a string for any attribute you look for in it. The string will always be the same as the requested attribute name. In your code, you'd want to do your fancy web-cache lookups and downloading, and then return the fetched module object instead of just returning a string.
class FakeModule(object):
def __getattribute__(self, name):
return name
import sys
sys.modules[__name__] = FakeModule()
On my system I've saved that as fakemodule.py. Now if I do from fakemodule import foo, I get foo with the value 'foo' in my local namespace.
Note that this only works for one level deep imports. If you do from fakemodule.subpackage import name it will not work because there's no fakemodule.subpackage entry in sys.modules.

Lazy-loading modules in python

I'm trying to put together a system that will handle lazy-loading of modules that don't explicitly exist. Basically I have an http server with a number of endpoints that I don't know ahead of time that I would like to programmatically offer for import. These modules would all have a uniform method signature, they just wouldn't exist ahead of time.
import lazy.route as test
import lazy.fake as test2
test('Does this exist?') # This sends a post request.
test2("This doesn't exist.") # Also sends a post request
I can handle all the logic I need around these imports with a uniform decorator, I just can't find any way of "decorating" imports in python, or actually interacting with them in any kind of programmatic way.
Does anyone have experience with this? I've been hunting around, and the closest thing I've found is the ast module, which would lead to a really awful kind of hacky implementation in my current under my current understanding (something like finding all import statements and manually over-writing the import function)
Not looking for a handout, just a piece of the python codebase to start looking at, or an example of someone that's done something similar.
I got a little clever in my googling and managed to find a PEP that specifically addressed this issue, it just happens to be relatively unknown, probably because the subset of reasonable uses for this is pretty narrow.
I found an excellent piece of example code showing off the new sys.meta_path implementation. I've posted it below for information on how to dynamically bootstrap your import statements.
import sys
class VirtualModule(object):
def hello(self):
return 'Hello World!'
class CustomImporter(object):
virtual_name = 'my_virtual_module'
def find_module(self, fullname, path):
"""This method is called by Python if this class
is on sys.path. fullname is the fully-qualified
name of the module to look for, and path is either
__path__ (for submodules and subpackages) or None (for
a top-level module/package).
Note that this method will be called every time an import
statement is detected (or __import__ is called), before
Python's built-in package/module-finding code kicks in."""
if fullname == self.virtual_name:
# As per PEP #302 (which implemented the sys.meta_path protocol),
# if fullname is the name of a module/package that we want to
# report as found, then we need to return a loader object.
# In this simple example, that will just be self.
return self
# If we don't provide the requested module, return None, as per
# PEP #302.
return None
def load_module(self, fullname):
"""This method is called by Python if CustomImporter.find_module
does not return None. fullname is the fully-qualified name
of the module/package that was requested."""
if fullname != self.virtual_name:
# Raise ImportError as per PEP #302 if the requested module/package
# couldn't be loaded. This should never be reached in this
# simple example, but it's included here for completeness. :)
raise ImportError(fullname)
# PEP#302 says to return the module if the loader object (i.e,
# this class) successfully loaded the module.
# Note that a regular class works just fine as a module.
return VirtualModule()
if __name__ == '__main__':
# Add our import hook to sys.meta_path
sys.meta_path.append(CustomImporter())
# Let's use our import hook
import my_virtual_module
print my_virtual_module.hello()
The full blog post is here

python unittest faulty module

I have a module that I need to test in python.
I'm using the unittest framework but I ran into a problem.
The module has some method definitions, one of which is used when it's imported (readConfiguration) like so:
.
.
.
def readConfiguration(file = "default.xml"):
# do some reading from xml
readConfiguration()
This is a problem because when I try to import the module it also tries to run the "readConfiguration" method which fails the module and the program (a configuration file does not exist in the test environment).
I'd like to be able to test the module independent of any configuration files.
I didn't write the module and it cannot be re-factored.
I know I can include a dummy configuration file but I'm looking for a "cleaner", more elegant, solution.
As commenters have already pointed out, imports should never have side effects, so try to get the module changed if at all possible.
If you really, absolutely, cannot do this, there might be another way: let readConfiguration() be called, but stub out its dependencies. For instance, if it uses the builtin open() function, you could mock that, as demonstrated in the mock documentation:
>>> mock = MagicMock(return_value=sentinel.file_handle)
>>> with patch('builtins.open', mock):
... import the_broken_module
... # do your testing here
Replace sentinel.file_handle with StringIO("<contents of mock config file>") if you need to supply actual content.
It's brittle as it depends on the implementation of readConfiguration(), but if there really is no other way, it might be useful as a last resort.

Categories