Inject code automatically to a python project and execute - python

I want to inject some code to all python modules in my project automatically.
I used ast.NodeTransformer and managed to change it quite easily, the problem is that I want to run the project.
An ast node is per module and I want to change all modules in the project and then run; and I have this example
The problem is that it applies to one node, viz. one file. I want to run this file, which imports and uses other files which I want to change too, so I'm not sure how to get it done.
I know I can use some ast-to-code module, like astor, but all are third party and I don't want to deal with bugs and unexpected issues.
Don't really know how to start, any suggestions?

I know I can use some ast-to-code module, like astor, but all are third party and I don't want to deal with bugs and unexpected issues.
From 3.9 onward there is ast.unparse, which practically does AST to source conversion after you transform it.

Related

How to make a list of the user-created python files that a module depends on?

I am interested in using doit to automate the build process of a python package.
If possible, I would like doit to re-execute a task if any of the user-created source files it depends on have changed.
From my understanding, the best way to accomplish this would be to use the file_dep key and a list of the dependent source files, however I am having a lot of trouble generating this list.
I've tried using sys.modules and inspect.getmembers(), but these solutions can't deal with import statements that do not import a module, such as from x import Y, which is unfortunately a common occurrence in the package I am developing.
Another route I investigated was to use the snakefood tool, which initially looks like it would do exactly what I wanted, generate a list of file dependencies for every file in a given path.
Unfortunately, this tool seems to have limited Python 3 support, making it useless for my package.
Does anyone have any insight into how to get snakefood-like features in Python 3, or is the only option to change all of my source code to only import modules?
doit tutorial itself is about creating a graph of python module imports!
It uses import_deps package, it is similar to snakefood.
Note that for your use-case you will need to modify file_dep itself during Task action's execution. To achieve that you need to pass the task parameter to your action (as described here).

How to tell whether a Python function with dependencies has changed?

tl;dr:
How can I cache the results of a Python function to disk and in a later session use the cached value if and only if the function code and all of its dependencies are unchanged since I last ran it?
In other words, I want to make a Python caching system that automatically watches out for changed code.
Background
I am trying to build a tool for automatic memoization of computational results from Python. I want the memoization to persist between Python sessions (i.e. be reusable at a later time in another Python instance, preferrably even on another machine with the same Python version).
Assume I have a Python module mymodule with some function mymodule.func(). Let's say I already solved the problem of serializing/identifying the function arguments, so we can assume that mymodule.func() takes no arguments if it simplifies anything.
Also assume that I guarantee that the function mymodule.func() and all its dependencies are deterministic, so mymodule.func() == mymodule.func().
The task
I want to run the function mymodule.func() today and save its results (and any other information necessary to solve this task). When I want the same result at a later time, I would like to load the cached result instead of running mymodule.func() again, but only if the code in mymodule.func() and its dependencies are unchanged.
To simplify things, we can assume that the function is always run in a freshly started Python interpreter with a minimal script like this:
import some_save_function
import mymodule
result = mymodule.func()
some_save_function(result, 'filename')
Also, note that I don't want to be overly conservative. It is probably not too hard to use the modulefinder module to find all modules involved when running the first time, and then not use the cache if any module has changed at all. But this defeats my purpose, because in my use case it is very likely that some unrelated function in an imported module has changed.
Previous work and tools I have looked at
joblib memoizes results tied to the function name, and also saves the source code so we can check if it is unchanged. However, as far as I understand it does not check upstream functions (called by mymodule.func()).
The ast module gives me the Abstract Syntax Tree of any Python code, so I guess I can (in principle) figure it all out that way. How hard would this be? I am not very familiar with the AST.
Can I use any of all the black magic that's going on inside dill?
More trivia than a solution: IncPy, a finished/deceased research project, implemented a Python interpreter doing this by default, always. Nice idea, but never made it outside the lab.
Grateful for any input!

Where to Store Borrowed Python Code?

Recently, I have been working on a Python project with usual directory structure, and have received help from someone else who has given me a code snippet (a single function definition, about 30 lines long) which I would like to import into my code. What is the most proper directory/location in a Python project to store borrowed code of this size? Is it best to store the snippet into an entirely different module and import it from there?
I generally find it easiest to put such code in a separate file, because for clarity you don't want more than one different copyright/licensing term to apply within a single file. So in Python this does indeed mean a separate module. Then the file can contain whatever attribution and other legal boilerplate you need.
As long as your file headers don't accidentally claim copyright on something to which you do not own the copyright, I don't think it's actually a legal problem to mix externally-licensed or public domain code into files you mostly own. I may be wrong, though, which is why I normally avoid giving myself reason to think about it. A comment saying "this is external code from the following source with the following license:" may well be clearer than dividing code into different files that naturally wouldn't be. So I do occasionally do that.
I don't see any definite need for a separate directory (or package) per separate external source. If that's already part of your project structure (that is, it already uses external libraries by incorporating their source) then I suppose you might as well continue the trend.
I usually place scripts I copy off the internet in a folder/package called borrowed so I know all of the code here is stuff that I didn't write myself.
That is, if it's something more substantial than a one or two-liner demonstrating how something works.

Navigating a big Python codebase faster

As programmers we read more than we write. I've started working at a company that uses a couple of "big" Python packages; packages or package-families that have a high KLOC. Case in point: Zope.
My problem is that I have trouble navigating this codebase fast/easily. My current strategy is
I start reading a module I need to change/understand
I hit an import which I need to know more of
I find out where the source code for that import is by placing a Python debug (pdb) statement after the imports and echoing the module, which tells me it's source file
I navigate to it, in shell or the Vim file explorer.
most of the time the module itself imports more modules and before I know it I've got 10KLOC "on my plate"
Alternatively:
I see a method/class I need to know more of
I do a search (ack-grep) for the definition of that method/class across the whole codebase (which can be a pain because the codebase is partly in ~/.buildout-eggs)
I find one or more pieces of code that define that method/class
I have to deduce which one of them is the one I need to read
This costs a lot of time, which is understandable for a big codebase. But I get the feeling that navigating a large and unknown Python codebase is a common enough problem.
So I'm looking for technical tools or strategic solutions for this problem.
...
I just can't imagine hardcore Python programmers using the strategies outlined above.
on Vim, I like NERDTree (a file browser) and taglist.vim (source code browser --> http://www.vim.org/scripts/script.php?script_id=273)
also in Vim, you can use CTRL-] to jump to a definition (:h CTRL-]):
download exuberant ctags http://ctags.sourceforge.net/
follow the install directions and put it somewhere on your PATH
from the 'root' directory of your source code, make a tags file from the shell: "ctags -R"
(make sure you have :set noautochdir, and make sure :pwd is the root directory from step 3)
go into Vim, cursor over some function or class name, hit CTRL-]
by default, if there's multiple matches for the tag, it shows you everywhere it was imported, and where it was declared
if the tag only has one match, it immediately jumps to it
...then use Ctrl+O and Ctrl+I to move back and forth from where you were
(repeat above steps for the source code of particular libraries you use, i usually keep a separate Vim window open to study stuff)
I use ipython's ?? command
You just need to figure out how to import the things you want to look for, then add ?? to the end of the module or class or function or method name to view their source code. And the command completion helps on figuring out long names as well.
Try red pill: https://github.com/klen/python-mode

how do you statically find dynamically loaded modules

How does one get (finds the location of) the dynamically imported modules from a python script ?
so, python from my understanding can dynamically (at run time) load modules.
Be it using _import_(module_name), or using the exec "from x import y", either using imp.find_module("module_name") and then imp.load_module(param1, param2, param3, param4) .
Knowing that I want to get all the dependencies for a python file. This would include getting (or at least I tried to) the dynamically loaded modules, those loaded either by using hard coded string objects or those returned by a function/method.
For normal import module_name and from x import y you can do either a manual scanning of the code or use module_finder.
So if I want to copy one python script and all its dependencies (including the custom dynamically loaded modules) how should I do that ?
You can't; the very nature of programming (in any language) means that you cannot predict what code will be executed without actually executing it. So you have no way of telling which modules could be included.
This is further confused by user-input, consider: __import__(sys.argv[1]).
There's a lot of theoretical information about the first problem, which is normally described as the Halting problem, the second just obviously can't be done.
From a theoretical perspective, you can never know exactly what/where modules are being imported. From a practical perspective, if you simply want to know where the modules are, check the module.__file__ attribute or run the script under python -v to find files when modules are loaded. This won't give you every module that could possibly be loaded, but will get most modules with mostly sane code.
See also: How do I find the location of Python module sources?
This is not possible to do 100% accurately. I answered a similar question here: Dependency Testing with Python
Just an idea and I'm not sure that it will work:
You could write a module that contains a wrapper for __builtin__.__import__. This wrapper would save a reference to the old __import__and then assign a function to __builtin__.__import__ that does the following:
whenever called, get the current stacktrace and work out the calling function. Maybe the information in the globals parameter to __import__ is enough.
get the module of that calling functions and store the name of this module and what will get imported
redirect the call the real __import__
After you have done this you can call your application with python -m magic_module yourapp.py. The magic module must store the information somewhere where you can retrieve it later.
That's quite of a question.
Static analysis is about predicting all possible run-time execution paths and making sure the program halts for specific input at all.
Which is equivalent to Halting Problem and unfortunately there is no generic solution.
The only way to resolve dynamic dependencies is to run the code.

Categories