How to represent imported module or function as string? - python

I have a program that runs through a database of objects, runs each object through a long runtime function, and then takes the output of that function to update the attributes of said object. Of course, if the function is going to return the same result (because the function's code has not changed), then we don't need to run it through the function again.
One way to solve this problem is to give the function a version number or something and check the version number against a version number stored in the object. For example, object.lastchecked=='VER2'. But this requires manually updating versions and remembering to do so.
One other, more creative way to do this would be to convert that whole function or even imported module/library into a string, hash the string, and then use that as an "automatic" version number of sorts. So if objects.lastchecked!=hash(functionconvertedtostring), we would run function(object).
What I can't figure out is how to convert an existing python module from an import line into a string. I could hardcode the path of this module and do a file read, but it's already been read once into memory. How do I get access to it?
Example:
from hugelibrary import littlefunction
for id,object in hugeobjectdb.items():
if object.lastchecked!=hash(littlefunction):
object.attributes=littlefunction(object)

There are at least 2 different ways of doing this.
First, if you use some kind of version control to work on your code, you might be able to store information about version of the module in the module file in an object that is accessible to python and that is updated automatically. Or you can use version control tools to get current version.
It might be hard to do that on the function level though, rather than file level. If it is a single function that has this requirement you might consider moving it to a separate file.
probably preferable would be to get the text of the function. for that, this answer seems to do the trick Can Python print a function definition?

Related

What is the best way to use different versions of a function from a library in different scripts?

Say I write a Python script for data analysis that includes a function parse_csv which might return a dataframe in a certain format. I can then analyse and plot this output etc.
Some time later, working with different data in the same format, I decide that it's better to parse the csv in a different way, so I improve the function so that it returns a slightly different output to be plotted etc. in a slightly different way. E.g. a dataframe that now contains datetime objects for the columns instead of just the date and time as a string.
I still want all the original scripts to run. I want all of the functions for the old and new scripts to be in one python library for the project. I don't want to go back and change all of the code for the old scripts to work with the new and improved function.
What is the best way to version my functions? I can think of a few options, but I'm wondering if missing anything:
Is there a way to do version control of libraries in python?, e.g. "import myLib version 0.1". Then I can just import different versions of the library for different scripts
I call the functions different things e.g. parse_csv_1, parse_csv_2
I have an argument to the function to redirect to different functions inside the function e.g.:
def parse_csv(csv, version=1):
if version == 1:
return parse_csv_1(csv)
elif version == 2:
return parse_csv_2(csv)
Is there a way to achieve what I want using the first option, as it seems cleaner to me, and would work for improvements to multiple functions across scripts? Or do I need to do this in a different way?
I have seen this question which refers to importing specific versions of libraries, but I haven't been able to see how I would install multiple versions at the same time.
That is a pretty interesting question. Let's address them one by one:
1. Library versioning:
If you are publishing your library to PyPi or any other repository, you can have different versions, and install a specific version using pip. However, you cannot have multiple versions of the same library
2. Calling it different things:
This would the prefered method. However, I don't recommend naming it _1, _2. You should name it in a descriptive way so you understand the difference between the different functions. So based on your description, you'd have parse_csv and parse_csv_date_as_datetime.
3. Redirecting with arguments
In general this is not a good idea. Arguments should not fundamentally change how your function behaves. That can lead to unexpected behaviours. Parameters should be the data you perform computations on.
In conclusion
I recommend just creating a second function. Use the original in your past code, use the new one in the new code.
If you really must have a single function, you could try applying the Strategy Pattern to handle different data formatting tasks, passing in a strategy function and having a default what matches the current behaviour, this however I'd say is probably beyond your current level.

Inject code automatically to a python project and execute

I want to inject some code to all python modules in my project automatically.
I used ast.NodeTransformer and managed to change it quite easily, the problem is that I want to run the project.
An ast node is per module and I want to change all modules in the project and then run; and I have this example
The problem is that it applies to one node, viz. one file. I want to run this file, which imports and uses other files which I want to change too, so I'm not sure how to get it done.
I know I can use some ast-to-code module, like astor, but all are third party and I don't want to deal with bugs and unexpected issues.
Don't really know how to start, any suggestions?
I know I can use some ast-to-code module, like astor, but all are third party and I don't want to deal with bugs and unexpected issues.
From 3.9 onward there is ast.unparse, which practically does AST to source conversion after you transform it.

How bad is it to explicitly add things to locals()?

I'm trying to dynamically load modules as explained here.
I have written a script that requires some modules that may not be installed by default on some systems (such as requests). The script code assumes that a regular import has been done (it uses requests.get).
If I use the code in the link above, to import requests I would have to use:
requests=importlib.import_module('requests')
But this leads to a lot of code duplication since I have several modules. I can't use that in a loop since the variable name must change with the imported module.
I have found that I can use:
for m in list_of_modules:
locals()[m]=importlib.import_module(m)
And everything happens as if I had done regular import's.
(of course the real code catches exceptions...).
So the question is how valid/risky this is? Too good to be true or not? (Python 2.7 if that makes a difference)
It is explicitely invalid. Doc of Python 2.7.15 says of locals() function:
The contents of this dictionary should not be modified; changes may not affect the values of local and free variables used by the interpreter.
locals() is a way for the program to know the list of variables in a function block. It is not a way to create local variables.
If you really need something like that, you can either use a local map, rely on the sys.modules map which is updated by import_module, or update the globals() map. Anyway, once a module was loaded, it exists (through the sys.module map) for the whole program, so it does not really make sense to store its reference in a local symbol table.
So if you really need to import a dynamically builded list of modules, I would do:
for m in list_of_modules:
globals()[m]=importlib.import_module(m)

How to tell whether a Python function with dependencies has changed?

tl;dr:
How can I cache the results of a Python function to disk and in a later session use the cached value if and only if the function code and all of its dependencies are unchanged since I last ran it?
In other words, I want to make a Python caching system that automatically watches out for changed code.
Background
I am trying to build a tool for automatic memoization of computational results from Python. I want the memoization to persist between Python sessions (i.e. be reusable at a later time in another Python instance, preferrably even on another machine with the same Python version).
Assume I have a Python module mymodule with some function mymodule.func(). Let's say I already solved the problem of serializing/identifying the function arguments, so we can assume that mymodule.func() takes no arguments if it simplifies anything.
Also assume that I guarantee that the function mymodule.func() and all its dependencies are deterministic, so mymodule.func() == mymodule.func().
The task
I want to run the function mymodule.func() today and save its results (and any other information necessary to solve this task). When I want the same result at a later time, I would like to load the cached result instead of running mymodule.func() again, but only if the code in mymodule.func() and its dependencies are unchanged.
To simplify things, we can assume that the function is always run in a freshly started Python interpreter with a minimal script like this:
import some_save_function
import mymodule
result = mymodule.func()
some_save_function(result, 'filename')
Also, note that I don't want to be overly conservative. It is probably not too hard to use the modulefinder module to find all modules involved when running the first time, and then not use the cache if any module has changed at all. But this defeats my purpose, because in my use case it is very likely that some unrelated function in an imported module has changed.
Previous work and tools I have looked at
joblib memoizes results tied to the function name, and also saves the source code so we can check if it is unchanged. However, as far as I understand it does not check upstream functions (called by mymodule.func()).
The ast module gives me the Abstract Syntax Tree of any Python code, so I guess I can (in principle) figure it all out that way. How hard would this be? I am not very familiar with the AST.
Can I use any of all the black magic that's going on inside dill?
More trivia than a solution: IncPy, a finished/deceased research project, implemented a Python interpreter doing this by default, always. Nice idea, but never made it outside the lab.
Grateful for any input!

how do you statically find dynamically loaded modules

How does one get (finds the location of) the dynamically imported modules from a python script ?
so, python from my understanding can dynamically (at run time) load modules.
Be it using _import_(module_name), or using the exec "from x import y", either using imp.find_module("module_name") and then imp.load_module(param1, param2, param3, param4) .
Knowing that I want to get all the dependencies for a python file. This would include getting (or at least I tried to) the dynamically loaded modules, those loaded either by using hard coded string objects or those returned by a function/method.
For normal import module_name and from x import y you can do either a manual scanning of the code or use module_finder.
So if I want to copy one python script and all its dependencies (including the custom dynamically loaded modules) how should I do that ?
You can't; the very nature of programming (in any language) means that you cannot predict what code will be executed without actually executing it. So you have no way of telling which modules could be included.
This is further confused by user-input, consider: __import__(sys.argv[1]).
There's a lot of theoretical information about the first problem, which is normally described as the Halting problem, the second just obviously can't be done.
From a theoretical perspective, you can never know exactly what/where modules are being imported. From a practical perspective, if you simply want to know where the modules are, check the module.__file__ attribute or run the script under python -v to find files when modules are loaded. This won't give you every module that could possibly be loaded, but will get most modules with mostly sane code.
See also: How do I find the location of Python module sources?
This is not possible to do 100% accurately. I answered a similar question here: Dependency Testing with Python
Just an idea and I'm not sure that it will work:
You could write a module that contains a wrapper for __builtin__.__import__. This wrapper would save a reference to the old __import__and then assign a function to __builtin__.__import__ that does the following:
whenever called, get the current stacktrace and work out the calling function. Maybe the information in the globals parameter to __import__ is enough.
get the module of that calling functions and store the name of this module and what will get imported
redirect the call the real __import__
After you have done this you can call your application with python -m magic_module yourapp.py. The magic module must store the information somewhere where you can retrieve it later.
That's quite of a question.
Static analysis is about predicting all possible run-time execution paths and making sure the program halts for specific input at all.
Which is equivalent to Halting Problem and unfortunately there is no generic solution.
The only way to resolve dynamic dependencies is to run the code.

Categories