Mocking Python iterables for use with Sphinx - python

I'm using Sphinx to document a project that depends on wxPython, using the autodocs extension so that it will automatically generate pages from our docstrings. The autodocs extension automatically operates on every module you import, which is fine for our packages but is a problem when we import a large external library like wxPython. Thus, instead of letting it generate everything from wxPython I'm using the unittest.mock library module (previously the external package Mock). The most basic setup works fine for most parts of wxPython, but I've run into a situation I can't see an easy way around (likely because of my relative unfamiliarity with mock until this week).
Currently, the end of my conf.py file has the following:
MOCK_MODULES = ['wx.lib.newevent'] # I've skipped irrelevant entries...
for module_name in MOCK_MODULES:
sys.modules[module_name] = mock.Mock()
For all the wxPython modules but wx.lib.newevent, this works perfectly. However, here I'm using the newevent.NewCommandEvent() function[1] to create an event for a particular scenario. In this case, I get a warning on the NewCommandEvent() call with the note TypeError: 'Mock' object is not iterable.
While I can see how one would use patching to handle this for building out unit tests (which I will be doing in the next month!), I'm having a hard time seeing how to integrate that at a simple level in my Sphinx configuration.
Edit: I've just tried using MagicMock() as well; this still produces an error at the same point, though it now produces ValueError: need more than 0 values to unpack. That seems like a step in the right direction, but I'm still not sure how to handle this short of explicitly setting it up for this one module. Maybe that's the best solution, though?
Footnotes
Yes, that's a function, naming convention making it look like a class notwithstanding; wxPython follows the C++ naming conventions which are used throughout the wxWidgets toolkit.

From the error, it looks like it is actually executing newevent.NewCommandEvent(), so I assume that somewhere in your code you have a top-level line something like this:
import wx.lib.newevent
...
event, binder = wx.lib.newevent.NewCommandEvent()
When autodoc imports the module, it tries to run this line of code, but since NewCommandEvent is actually a Mock object, Python can't bind its output to the (event, binder) tuple. There are two possible solutions. The first is to change your code to that this is not executed on import, maybe by wrapping it inside if __name__ == '__main__'. I would recommend this solution because creating objects like this on import can often have preblematic side effects.
The second solution is to tell the Mock object to return appropriate values thus:
wx.lib.newevent.NewCommandEvent = mock.Mock(return_value=(Mock(), Mock()))
However, if you are doing anything in your code with the returned values you might run into the same problem further down the line.

Related

How to tell whether a Python function with dependencies has changed?

tl;dr:
How can I cache the results of a Python function to disk and in a later session use the cached value if and only if the function code and all of its dependencies are unchanged since I last ran it?
In other words, I want to make a Python caching system that automatically watches out for changed code.
Background
I am trying to build a tool for automatic memoization of computational results from Python. I want the memoization to persist between Python sessions (i.e. be reusable at a later time in another Python instance, preferrably even on another machine with the same Python version).
Assume I have a Python module mymodule with some function mymodule.func(). Let's say I already solved the problem of serializing/identifying the function arguments, so we can assume that mymodule.func() takes no arguments if it simplifies anything.
Also assume that I guarantee that the function mymodule.func() and all its dependencies are deterministic, so mymodule.func() == mymodule.func().
The task
I want to run the function mymodule.func() today and save its results (and any other information necessary to solve this task). When I want the same result at a later time, I would like to load the cached result instead of running mymodule.func() again, but only if the code in mymodule.func() and its dependencies are unchanged.
To simplify things, we can assume that the function is always run in a freshly started Python interpreter with a minimal script like this:
import some_save_function
import mymodule
result = mymodule.func()
some_save_function(result, 'filename')
Also, note that I don't want to be overly conservative. It is probably not too hard to use the modulefinder module to find all modules involved when running the first time, and then not use the cache if any module has changed at all. But this defeats my purpose, because in my use case it is very likely that some unrelated function in an imported module has changed.
Previous work and tools I have looked at
joblib memoizes results tied to the function name, and also saves the source code so we can check if it is unchanged. However, as far as I understand it does not check upstream functions (called by mymodule.func()).
The ast module gives me the Abstract Syntax Tree of any Python code, so I guess I can (in principle) figure it all out that way. How hard would this be? I am not very familiar with the AST.
Can I use any of all the black magic that's going on inside dill?
More trivia than a solution: IncPy, a finished/deceased research project, implemented a Python interpreter doing this by default, always. Nice idea, but never made it outside the lab.
Grateful for any input!

Python: how to ensure that import comes from defining module?

I think the following is bad style:
# In foo_pkg/bla_mod.py:
magic_value=42
# In foo_pkg/bar_mod.py:
from .bla_mod import magic_value
# In doit.py:
from foo_pkg.bar_mod import magic_value
Instead, I'd like that to always import an object from the module where it has been defined, i.e. in this case:
# In doit.py:
from foo_pkg.bla_mod import magic_value
Finding issues of this sort by hand is getting tedious very quickly (for each imported object, you have to open the module and check if it defines the object, or if it imports in from another module).
What is the best way to automate this check? To my surprise, neither pylint nor pyflakes seem to have an appropriate checker, but maybe there's another tool (or even some trick that can be used in Python itself)?
Problem statement in a nutshell: given a bunch of python source files, find every import of an object from a module that does not itself define the object.
I know there are libraries (including the standard library) where one module provides the main external API and imports the necessary symbols from other modules internally. However, that's not the case in the code base I'm working with, here these are artifacts of refactorings that I really want to eliminate.
Here's a draft script that solves the problem less than 100 lines: http://pastebin.com/CFsR6b3s

How to call methods of the instance of a library already in scope of the current running test case

I have a library that interfaces with an external tool and exposes some basic keywords to use from robotframework; This library is implemented as a python package, and I would like to implement extended functionality that implements complex logic, and exposes more keywords, within modules of this package. The package is given test case scope, but I'm not entirely sure how this works. If I suggest a few ways I have thought of, could someone with a bit more knowledge let me know where I'm on the right track, and where I'm barking up the wrong tree...
Use an instance variable - if the scope is such that the python interpreter will see the package as imported by the current test case (i.e this is treated as a separate package in different test cases rather than a separate instance of the same package), then on initialisation I could set a global variable INSTANCE to self and then from another module within the package, import INSTANCE and use it.
Use an instance dictionary - if the scope is such that all imports see the package as the same, I could use robot.running.context to set a dictionary key such that there is an item in the instance dictionary for each context where the package has been imported - this would then mean that I could use the same context variable as a lookup key in the modules that are based on this. (The disadvantage of this one is that it will prevent garbage collection until the package itself is out of scope, and relies on it being in scope persistently.)
A context variable that I am as of yet unaware of that will give me the instance that is in scope. The docs are fairly difficult to search, so it's fully possible that there is something that I'm missing that will make this trivial. Also just as good would be something that allowed me to call the keywords that are in scope.
Some excellent possibility I haven't considered....
So can anyone help?
Credit for this goes to Kevin O. from the robotframework user group, but essentially the magic lives in robot.libraries.BuiltIn.BuiltIn().get_library_instance(library_name) which can be used like this:
from robot.libraries.BuiltIn import BuiltIn
class SeleniumTestLibrary(object):
def element_should_be_really_visible(self):
s2l = BuiltIn().get_library_instance('Selenium2Library')
element = s2l._element_find(locator, True, False)
It sounds like you are talking about monkeypatching the imported code, so that other modules which import that package will also see your runtime modifications. (Correct me if I'm wrong; there are a couple of bits in your question that I'm not quite following)
For simple package imports, this should work:
import my_package
def method_override():
return "Foo"
my_package.some_method = method_override
my_package, in this case, refers to the imported module, and is not just a local name, so other modules will see the overridden method.
This won't work in cases where other code has already done
from my_package import some_method
Since in that case, some_method is a local name in the place it is imported. If you replace the method elsewhere, that change won't be seen.
If this is happening, then you either need to change the source to import the entire module, or patch a little bit deeper, by replacing method internals:
import my_package
def method_override():
return "Foo"
my_package.some_method.func_code = method_override.func_code
At that point, it doesn't matter how the method was imported in any other module; the code object associated with the method has been replaced, and your new code will run rather than the original.
The only thing to worry about in that case is that the module is imported from the same path in every case. The Python interpreter will try to reuse existing modules, rather than re-import and re-initialize them, whenever they are imported from the same path.
However, if your python path is set up to contain two directories, say: '/foo' and '/foo/bar', then these two imports
from foo.bar import baz
and
from bar import baz
would end up loading the module twice, and defining two versions of any objects (methods, classes, etc) in the module. If that happens, then patching one will not affect the other.
If you need to guard against that case, then you may have to traverse sys.modules, looking for the imported package, and patching each version that you find. This, of course, will only work if all of the other imports have already happened, you can't do that pre-emptively (without writing an import hook, but that's another level deeper again :) )
Are you sure you can't just fork the original package and extend it directly? That would be much easier :)

how do you statically find dynamically loaded modules

How does one get (finds the location of) the dynamically imported modules from a python script ?
so, python from my understanding can dynamically (at run time) load modules.
Be it using _import_(module_name), or using the exec "from x import y", either using imp.find_module("module_name") and then imp.load_module(param1, param2, param3, param4) .
Knowing that I want to get all the dependencies for a python file. This would include getting (or at least I tried to) the dynamically loaded modules, those loaded either by using hard coded string objects or those returned by a function/method.
For normal import module_name and from x import y you can do either a manual scanning of the code or use module_finder.
So if I want to copy one python script and all its dependencies (including the custom dynamically loaded modules) how should I do that ?
You can't; the very nature of programming (in any language) means that you cannot predict what code will be executed without actually executing it. So you have no way of telling which modules could be included.
This is further confused by user-input, consider: __import__(sys.argv[1]).
There's a lot of theoretical information about the first problem, which is normally described as the Halting problem, the second just obviously can't be done.
From a theoretical perspective, you can never know exactly what/where modules are being imported. From a practical perspective, if you simply want to know where the modules are, check the module.__file__ attribute or run the script under python -v to find files when modules are loaded. This won't give you every module that could possibly be loaded, but will get most modules with mostly sane code.
See also: How do I find the location of Python module sources?
This is not possible to do 100% accurately. I answered a similar question here: Dependency Testing with Python
Just an idea and I'm not sure that it will work:
You could write a module that contains a wrapper for __builtin__.__import__. This wrapper would save a reference to the old __import__and then assign a function to __builtin__.__import__ that does the following:
whenever called, get the current stacktrace and work out the calling function. Maybe the information in the globals parameter to __import__ is enough.
get the module of that calling functions and store the name of this module and what will get imported
redirect the call the real __import__
After you have done this you can call your application with python -m magic_module yourapp.py. The magic module must store the information somewhere where you can retrieve it later.
That's quite of a question.
Static analysis is about predicting all possible run-time execution paths and making sure the program halts for specific input at all.
Which is equivalent to Halting Problem and unfortunately there is no generic solution.
The only way to resolve dynamic dependencies is to run the code.

Is there anyway to clear python bytecode cache?

each unit test I'm running is writing python code out to a file, then importing it as a module. The problem is that the code changes but further import statements don't modify the module.
I think what I need is a way to ether force a reload on a module or clear the internal bytecode cache. Any ideas?
Thanks!
Reimporting modules is tricky to get all the edge cases right. The documentation for reload mentions some of them. Depending on what you are testing, you may be better off by testing the imports with separate invocations of the interpreter by running each via, say, subprocess. It will likely be slower but also likely safer and more accurate testing.
Use reload().
Reload a previously imported module.
The argument must be a module object,
so it must have been successfully
imported before. This is useful if you
have edited the module source file
using an external editor and want to
try out the new version without
leaving the Python interpreter. The
return value is the module object (the
same as the module argument).
However, the module needs to be already loaded. A workaround is to handle the resulting NameError:
try:
reload(math)
except NameError:
import math
Write your code to differently-named modules. Writing new code into an existing file, and trying to import it again will not work well.
Alternatively, you can clobber sys.modules. For example:
class MyTestCase(unittest.TestCase):
def setUp(self):
# Record sys.modules here so we can restore it in tearDown.
self.old_modules = dict(sys.modules)
def tearDown(self):
# Remove any new modules imported during the test run. This lets us
# import the same source files for more than one test.
for m in [m for m in sys.modules if m not in self.old_modules]:
del sys.modules[m]
Ran into a similar situation.
Later on found that the white space indentation technique used matters.
Especially on windows platforms, ensure that a uniform technique is adapted
throughout the module i.e., either use tab or spaces exclusively.

Categories