how do you statically find dynamically loaded modules - python

How does one get (finds the location of) the dynamically imported modules from a python script ?
so, python from my understanding can dynamically (at run time) load modules.
Be it using _import_(module_name), or using the exec "from x import y", either using imp.find_module("module_name") and then imp.load_module(param1, param2, param3, param4) .
Knowing that I want to get all the dependencies for a python file. This would include getting (or at least I tried to) the dynamically loaded modules, those loaded either by using hard coded string objects or those returned by a function/method.
For normal import module_name and from x import y you can do either a manual scanning of the code or use module_finder.
So if I want to copy one python script and all its dependencies (including the custom dynamically loaded modules) how should I do that ?

You can't; the very nature of programming (in any language) means that you cannot predict what code will be executed without actually executing it. So you have no way of telling which modules could be included.
This is further confused by user-input, consider: __import__(sys.argv[1]).
There's a lot of theoretical information about the first problem, which is normally described as the Halting problem, the second just obviously can't be done.

From a theoretical perspective, you can never know exactly what/where modules are being imported. From a practical perspective, if you simply want to know where the modules are, check the module.__file__ attribute or run the script under python -v to find files when modules are loaded. This won't give you every module that could possibly be loaded, but will get most modules with mostly sane code.
See also: How do I find the location of Python module sources?

This is not possible to do 100% accurately. I answered a similar question here: Dependency Testing with Python

Just an idea and I'm not sure that it will work:
You could write a module that contains a wrapper for __builtin__.__import__. This wrapper would save a reference to the old __import__and then assign a function to __builtin__.__import__ that does the following:
whenever called, get the current stacktrace and work out the calling function. Maybe the information in the globals parameter to __import__ is enough.
get the module of that calling functions and store the name of this module and what will get imported
redirect the call the real __import__
After you have done this you can call your application with python -m magic_module yourapp.py. The magic module must store the information somewhere where you can retrieve it later.

That's quite of a question.
Static analysis is about predicting all possible run-time execution paths and making sure the program halts for specific input at all.
Which is equivalent to Halting Problem and unfortunately there is no generic solution.
The only way to resolve dynamic dependencies is to run the code.

Related

Is there an easy way to find what function is calling another function in Python?

I'm not too well-versed in Python, although, I think this question is mostly language-agnostic, but wanted to see if there was a particular way to do it for Python.
I'm working with a couple of Pytorch Python scripts.
The function test_qtensor_cpu on line 147 in https://github.com/pytorch/pytorch/blob/fbcode/warm/test/quantization/core/test_quantized_tensor.py is being executed when I run the script in https://github.com/pytorch/pytorch/blob/fbcode/warm/test/test_quantization.py, but I cannot find the function that calls test_qtensor_cpu. I did a grep -ri "test_qtensor_cpu*" . in the root director of this repo and the only result was the definition of this function.
Is there a way for this function to be called without explicitly writing out the function's name?
Just add:
def my_func_i_cant_figure_out_whats_calling_it()
import traceback
traceback.print_stack()
That will show you the callstack at that point, even without a breakpoint.
Yes, its possible to call a function without explicitly writing it out.
(and figuring out how to use a debugger is super useful... future you will thank you if you figure it out sooner rather than later)
Line 29 of test_quantization.py imports TestQuantizedTensor (which includes the test_qtensor_cpu method).
The run_tests() (source here) at the bottom of the file will automatically run all test cases that have been imported (which includes TestQuantitizedTensor) via unittest (usually via unittest.main, though this can be changed by args passed to the test suite).

How to make a list of the user-created python files that a module depends on?

I am interested in using doit to automate the build process of a python package.
If possible, I would like doit to re-execute a task if any of the user-created source files it depends on have changed.
From my understanding, the best way to accomplish this would be to use the file_dep key and a list of the dependent source files, however I am having a lot of trouble generating this list.
I've tried using sys.modules and inspect.getmembers(), but these solutions can't deal with import statements that do not import a module, such as from x import Y, which is unfortunately a common occurrence in the package I am developing.
Another route I investigated was to use the snakefood tool, which initially looks like it would do exactly what I wanted, generate a list of file dependencies for every file in a given path.
Unfortunately, this tool seems to have limited Python 3 support, making it useless for my package.
Does anyone have any insight into how to get snakefood-like features in Python 3, or is the only option to change all of my source code to only import modules?
doit tutorial itself is about creating a graph of python module imports!
It uses import_deps package, it is similar to snakefood.
Note that for your use-case you will need to modify file_dep itself during Task action's execution. To achieve that you need to pass the task parameter to your action (as described here).

How bad is it to explicitly add things to locals()?

I'm trying to dynamically load modules as explained here.
I have written a script that requires some modules that may not be installed by default on some systems (such as requests). The script code assumes that a regular import has been done (it uses requests.get).
If I use the code in the link above, to import requests I would have to use:
requests=importlib.import_module('requests')
But this leads to a lot of code duplication since I have several modules. I can't use that in a loop since the variable name must change with the imported module.
I have found that I can use:
for m in list_of_modules:
locals()[m]=importlib.import_module(m)
And everything happens as if I had done regular import's.
(of course the real code catches exceptions...).
So the question is how valid/risky this is? Too good to be true or not? (Python 2.7 if that makes a difference)
It is explicitely invalid. Doc of Python 2.7.15 says of locals() function:
The contents of this dictionary should not be modified; changes may not affect the values of local and free variables used by the interpreter.
locals() is a way for the program to know the list of variables in a function block. It is not a way to create local variables.
If you really need something like that, you can either use a local map, rely on the sys.modules map which is updated by import_module, or update the globals() map. Anyway, once a module was loaded, it exists (through the sys.module map) for the whole program, so it does not really make sense to store its reference in a local symbol table.
So if you really need to import a dynamically builded list of modules, I would do:
for m in list_of_modules:
globals()[m]=importlib.import_module(m)

How to tell whether a Python function with dependencies has changed?

tl;dr:
How can I cache the results of a Python function to disk and in a later session use the cached value if and only if the function code and all of its dependencies are unchanged since I last ran it?
In other words, I want to make a Python caching system that automatically watches out for changed code.
Background
I am trying to build a tool for automatic memoization of computational results from Python. I want the memoization to persist between Python sessions (i.e. be reusable at a later time in another Python instance, preferrably even on another machine with the same Python version).
Assume I have a Python module mymodule with some function mymodule.func(). Let's say I already solved the problem of serializing/identifying the function arguments, so we can assume that mymodule.func() takes no arguments if it simplifies anything.
Also assume that I guarantee that the function mymodule.func() and all its dependencies are deterministic, so mymodule.func() == mymodule.func().
The task
I want to run the function mymodule.func() today and save its results (and any other information necessary to solve this task). When I want the same result at a later time, I would like to load the cached result instead of running mymodule.func() again, but only if the code in mymodule.func() and its dependencies are unchanged.
To simplify things, we can assume that the function is always run in a freshly started Python interpreter with a minimal script like this:
import some_save_function
import mymodule
result = mymodule.func()
some_save_function(result, 'filename')
Also, note that I don't want to be overly conservative. It is probably not too hard to use the modulefinder module to find all modules involved when running the first time, and then not use the cache if any module has changed at all. But this defeats my purpose, because in my use case it is very likely that some unrelated function in an imported module has changed.
Previous work and tools I have looked at
joblib memoizes results tied to the function name, and also saves the source code so we can check if it is unchanged. However, as far as I understand it does not check upstream functions (called by mymodule.func()).
The ast module gives me the Abstract Syntax Tree of any Python code, so I guess I can (in principle) figure it all out that way. How hard would this be? I am not very familiar with the AST.
Can I use any of all the black magic that's going on inside dill?
More trivia than a solution: IncPy, a finished/deceased research project, implemented a Python interpreter doing this by default, always. Nice idea, but never made it outside the lab.
Grateful for any input!

Is there a way to automatically verify all imports in a Python script without running it?

Suppose that I have a somewhat long Python script (too long to hand-audit) that contains an expensive operation, followed by a bunch of library function calls that are dependent on the output of the expensive operation.
If I have not imported all the necessary modules for the library function calls, then Python will error out only after the expensive operation has finished, because Python interprets line by line.
Is there a way to automatically verify that I have all the necessary imports without either a) manually verifying it line by line or b) running through the expensive operation each time I miss a library?
Another way to put that question is whether there is a tool that will do what the C compiler does with respect to verifying dependencies before run time.
No, this is not possible, because dependencies can be injected at runtime.
Consider:
def foo(break_things):
if not break_things:
globals()['bar'] = lambda: None
long_result = ...
foo(long_result > 0)
bar()
Which depending on the runtime value of long_result, may give NameError: name 'bar' is not defined.
There is a module called snakefood:
Generate dependency graphs from Python code
It uses the AST to parse
the Python files.
This is very reliable, it always runs. No module is
loaded. Loading modules to figure out dependencies is almost always
problem, because a lot of codebases run initialization code in the
global namespace, which often requires additional setup. Snakefood is
guaranteed not to have this problem (it just runs, no matter what).
You can get a list of imports by calling sfood-imports <script.py>. Then you can import each module in the list one by one and see if it throws ImportError.
Or, just use pylint. Quote from docs:
Error detection
checking if declared interfaces are truly implemented
checking if modules are imported
Hope that helps.

Categories