Python, unit-testing and mocking imports - python

I am in a project where we are starting refactoring some massive code base. One problem that immediately sprang up is that each file imports a lot of other files. How do I in an elegant way mock this in my unit test without having to alter the actual code so I can start to write unit-tests?
As an example: The file with the functions I want to test, imports ten other files which is part of our software and not python core libs.
I want to be able to run the unit tests as separately as possible and for now I am only going to test functions that does not depend on things from the files that are being imported.
Thanks for all the answers.
I didn't really know what I wanted to do from the start but now I think I know.
Problem was that some imports was only possible when the whole application was running because of some third-party auto-magic. So I had to make some stubs for these modules in a directory which I pointed out with sys.path
Now I can import the file which contains the functions I want to write tests for in my unit-test file without complaints about missing modules.

If you want to import a module while at the same time ensuring that it doesn't import anything, you can replace the __import__ builtin function.
For example, use this class:
class ImportWrapper(object):
def __init__(self, real_import):
self.real_import = real_import
def wrapper(self, wantedModules):
def inner(moduleName, *args, **kwargs):
if moduleName in wantedModules:
print "IMPORTING MODULE", moduleName
self.real_import(*args, **kwargs)
else:
print "NOT IMPORTING MODULE", moduleName
return inner
def mock_import(self, moduleName, wantedModules):
__builtins__.__import__ = self.wrapper(wantedModules)
try:
__import__(moduleName, globals(), locals(), [], -1)
finally:
__builtins__.__import__ = self.real_import
And in your test code, instead of writing import myModule, write:
wrapper = ImportWrapper(__import__)
wrapper.mock_import('myModule', [])
The second argument to mock_import is a list of module names you do want to import in inner module.
This example can be modified further to e.g. import other module than desired instead of just not importing it, or even mocking the module object with some custom object of your own.

If you really want to muck around with the python import mechanism, take a look at the ihooks module. It provides tools for changing the behavior of the __import__ built-in. But it's not clear from your question why you need to do this.

"imports a lot of other files"? Imports a lot of other files that are part of your customized code base? Or imports a lot of other files that are part of the Python distribution? Or imports a lot of other open source project files?
If your imports don't work, you have a "simple" PYTHONPATH problem. Get all of your various project directories onto a PYTHONPATH that you can use for testing. We have a rather complex path, in Windows we manage it like this
#set Part1=c:\blah\blah\blah
#set Part2=c:\some\other\path
#set that=g:\shared\stuff
set PYTHONPATH=%part1%;%part2%;%that%
We keep each piece of the path separate so that we (a) know where things come from and (b) can manage change when we move things around.
Since the PYTHONPATH is searched in order, we can control what gets used by adjusting the order on the path.
Once you have "everything", it becomes a question of trust.
Either
you trust something (i.e., the Python code base) and just import it.
Or
You don't trust something (i.e., your own code) and you
test it separately and
mock it for stand-alone testing.
Would you test the Python libraries? If so, you've got a lot of work. If not, then, you should perhaps only mock out the things you're actually going to test.

No difficult manipulation is necessary if you want a quick-and-dirty fix before your unit-tests.
If the unit tests are in the same file as the code you wish to test, simply delete unwanted module from the globals() dictionary.
Here is a rather lengthy example: suppose you have a module impp.py with contents:
value = 5
Now, in your test file you can write:
>>> import impp
>>> print globals().keys()
>>> def printVal():
>>> print impp.value
['printVal', '__builtins__', '__file__', 'impp', '__name__', '__doc__']
Note that impp is among the globals, because it was imported. Calling the function printVal that uses impp module still works:
>>> printVal()
5
But now, if you remove impp key from globals()...
>>> del globals()['impp']
>>> print globals().keys()
['printVal', '__builtins__', '__file__', '__name__', '__doc__']
...and try to call printVal(), you'll get:
>>> printVal()
Traceback (most recent call last):
File "test_imp.py", line 13, in <module>
printVal()
File "test_imp.py", line 5, in printVal
print impp.value
NameError: global name 'impp' is not defined
...which is probably exactly what you're trying to achieve.
To use it in your unit-tests, you can delete the globals just before running the test suite, e.g. in __main__:
if __name__ == '__main__':
del globals()['impp']
unittest.main()

In your comment above, you say you want to convince python that certain modules have already been imported. This still seems like a strange goal, but if that's really what you want to do, in principle you can sneak around behind the import mechanism's back, and change sys.modules. Not sure how this'd work for package imports, but should be fine for absolute imports.

Related

get description of an installed package without actual importing it

If you type this:
import somemodule
help(somemodule)
it will print out paged package description. I would need to get the same description as a string but without importing this package to the current namespace. Is this possible? It surely is, because anything is possible in Python, but what is the most elegant/pythonic way of doing so?
Side note: by elegant way I mean without opening a separate process and capturing its stdout... ;)
In other words, is there a way to peek into a unimported but installed package and get its description? Maybe something with importlib.abc.InspectLoader? But I have no idea how to make it work the way I need.
UPDATE: I need not just not polluting the namespace but also do this without leaving any traces of itself or dependent modules in memory and in sys.modules etc. Like it was never really imported.
UPDATE: Before anyone asks me why I need it - I want to list all installed python packages with their description. But after this I do not want to have them imported in sys.modules nor occupying excessive space in memory because there can be a lots of them.
The reason that you will need to import the module to get a help string is that in many cases, the help strings are actually generated in code. It would be pointlessly difficult to parse the text of such a package to get the string since you would then have to write a small Python interpreter to reconstruct the actual string.
That being said, there are ways of completely deleting a temporarily imported modules based on this answer, which summarizes a thread that appeared on the Python mailing list around 2003: http://web.archive.org/web/20080926094551/http://mail.python.org/pipermail/python-list/2003-December/241654.html. The methods described here will generally only work if the module is not referenced elsewhere. Otherwise the module will be unloaded in the sense that import will reload it from scratch instead of using the existing sys.modules entry, but the module will still live in memory.
Here is a function that does approximately what you want and even prints a warning if the module does not appear to have been unloaded. Unlike the solutions proposed in the linked answer, this function really handles all the side-effects of loading a module, including the fact that importing one package may import other external packages into sys.modules:
import sys, warnings
def get_help(module_name):
modules_copy = sys.modules.copy()
module = __import__(module_name)
h = help(module)
for modname in list(sys.modules):
if modname not in modules_copy:
del sys[modname]
if sys.getrefcount(module) > 1:
warnings.warn('Module {} is likely not to be completely wiped'.format(module_name))
del module
return h
The reason that I make a list of the keys in the final loop is that it is inadvisable to modify a dictionary (or any other iterable) as you iterate through it. At least in Python 3, dict.keys() returns an iterable that is backed by the dictionary itself, not a frozen copy. I am not sure if h = ... and return h are even necessary, but in the worst case, h is just None.
Well, if you are only worried about keeping the global namespace tidy, you could always import in a function:
>>> def get_help():
... import math
... help(math)
...
>>> math
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'math' is not defined
I would suggest a different approach, if i understand you correctly, you wish to read a portion of a package, without importing it (even within a function with local scope). I would suggest a method to do so would be via accessing the (python_path)/Lib/site-packages/(package_name)/ and reading the contents of the respective files as an alternative to importing the module so Python can.

Importing Class in Python Subpackage imports more than requested

Overview
I'm running some scientific simulations and I want to process the resulting data in Python. The simulation produces a custom data type that is not used outside of the chain of programs that the authors of the simulation produced, so unfortunately I need what they provide me.
They want me to install two files:
A module called sdds.py that defines a class that provides all user functions and two demos
A compiled module called sddsdatamodule.so that only provides helper functions to sdds.py.
(I find it strange that they're offering me two modules that are so inextricably connected, it doesn't seem like good coding practice to me, but using their code is probably better than rewriting things from scratch.) I'd prefer not to install them directly into my path, side by side. They come from the same company, they're designed to do one specific task together: access and manipulate SDDS-type files.
So I thought I would put them in a package. I could install that on my path, it would be self-contained, and I could easily find and uninstall or upgrade the modules from one location. Then I could hide their un-Pythonic solution in a more-Pythonic package without significantly rewriting things. Seems elegant.
Details
The package I actually use is found here:
http://www.aps.anl.gov/Accelerator_Systems_Division/Accelerator_Operations_Physics/software.shtml#PythonBinaries
Unfortunately, they only support Windows and Mac OS X right now. Compiling the source code is quite onerous, and apparently they have no significant requests for Linux/Unix. I have a Mac, so thankfully this isn't a problem for me.
So my directory tree looks like this:
SDDSPython/ My toplevel package
__init__.py Designed to only import the SDDS class
sdds.py Defines SDDS class and two demo methods
sddsdatamodule.so Defines sddsdata module used by SDDS class.
My __init__.py file literally only contains this:
from sdds import SDDS
The sdds.py file contains the class definition and the two demo definitions. The only other code in the sdds.py file is:
import sddsdata, sys, time
class SDDS:
(lots of code here)
def demo(output):
(lots of code here)
def demo2(output):
(lots of code here)
I can then import SDDSPython and check, using dir:
>>> import SDDSPython
>>> dir(SDDSPython)
['SDDS', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '__path__', 'sdds', 'sddsdata']
So I can now access the SDDS class via SDDSPython.SDDS
Question
How on earth did SDDSPython.sdds and SDDSPython.sddsdata get loaded into the SDDSPython namespace??
>>> SDDSPython.sdds
<module 'SDDSPython.sdds' from 'SDDSPython/sdds.pyc'>
>>> SDDSPython.sddsdata
<module 'SDDSPython.sddsdata' from 'SDDSPython/sddsdatamodule.so'>
I thought by creating an __init__.py file I was specifically excluding the sdds and sddsdata modules from being loaded into the SDDSPython namespace. What is going on? I can only assume this is happening due to something in the sddsdatamodule.so file? But how can a module affect its parent's namespace like that? I'm rather lost, and I don't know where to start. I've looked at the C code, but I don't see anything suspicious. To be fair- I probably don't know what something suspicious would look like, I'm probably not familiar enough with programming C extensions for Python.
Curious question--I did some investigation for you using a similar test case.
XML/
__init__.py -from indent import XMLIndentGenerator
indent.py -contains class XMLIndentGenerator, and Xml
Sink.py
It appears that importing a class from a module, even though you are importing just a portion, the entire module is accessible in the way you described, that is:
>>>import XML
>>>XML.indent
<module 'XML.indent' from 'XML\indent.py'>
>>>XML.indent.Xml #did not include this in the from
<class 'XML.indent.Xml'>
>>>XML.Sink
Traceback (most recent call last):
AttributeError:yadayada no attribute 'Sink'
This is expected, since I did not import Sink in __init__.py.....BUT!
I added a line to indent.py:
import Sink
class XMLIndentGenerator(XMLGenerator):
(code)
Now, since this class imports a module contained within the XML package, if i do:
>>>import XML
>>>XML.Sink
<module 'XML.Sink' from 'XML\Sink.pyc'>
So, it appears that because your imported sdds module also imports sddsdata, you are able to access it. That answers the "How" portion of your question, but "why" this is the case, I'm sure there's an answer somewhere in the docs :)
I hope this helps - I was literally doing this as I was typing the answer! A learning experience for me as well.
This happens because python imports don't work the way you might think. They work like this:
the import machinery looks for a file that should be the module requested from the import
a types.ModuleType instance is created, several attributes on it are set to the corresponding file (__file__, __name__ and so on), and that object is inserted into sys.modules under the fully qualified module name it would have.
if this is a submodule import (ie, sdds.py which is a submodule in SDDSPython), the newly created module is attached as an attribute to the existing python module of the parent package.
the file is "executed" with that module as its global scope; all names defined by that file appear as attributes of the module.
in the case of a from import, an attribute from the module may be returned to the importing script.
So that means if I import a module (say, foo.py) that has, as its source only:
import bar
then there is a global in foo, called bar, and I can access it as foo.bar.
There is no capacity in python for "only execute the part of this python script i want to use right now." The whole thing runs.

python unittest faulty module

I have a module that I need to test in python.
I'm using the unittest framework but I ran into a problem.
The module has some method definitions, one of which is used when it's imported (readConfiguration) like so:
.
.
.
def readConfiguration(file = "default.xml"):
# do some reading from xml
readConfiguration()
This is a problem because when I try to import the module it also tries to run the "readConfiguration" method which fails the module and the program (a configuration file does not exist in the test environment).
I'd like to be able to test the module independent of any configuration files.
I didn't write the module and it cannot be re-factored.
I know I can include a dummy configuration file but I'm looking for a "cleaner", more elegant, solution.
As commenters have already pointed out, imports should never have side effects, so try to get the module changed if at all possible.
If you really, absolutely, cannot do this, there might be another way: let readConfiguration() be called, but stub out its dependencies. For instance, if it uses the builtin open() function, you could mock that, as demonstrated in the mock documentation:
>>> mock = MagicMock(return_value=sentinel.file_handle)
>>> with patch('builtins.open', mock):
... import the_broken_module
... # do your testing here
Replace sentinel.file_handle with StringIO("<contents of mock config file>") if you need to supply actual content.
It's brittle as it depends on the implementation of readConfiguration(), but if there really is no other way, it might be useful as a last resort.

How does/should global data in modules across packages be managed in Python/other languages?

I am trying to design the package and module system for a programming language (Heron) which can be both compiled and interpreted, and from what I have seen I really like the Python approach. Python has a rich choice of modules, which seems to contribute largely to its success.
What I don`t know is what happens in Python if a module is included in two different compiled packages: are there separate copies made of the data or is it shared?
Related to this are a bunch of side-questions:
Am I right in assuming that packages can be compiled in Python?
What are there pros and cons to the two approaches (copying or sharing of module data)?
Are there widely known problems with the Python module system, from the point of view of the Python community? For example is there a PEP under consideration for enhancing modules/packages?
Are there certain aspects of the Python module/package system which wouldn`t work well for a compiled language?
Well, you asked a lot of questions. Here are some hints to get a bit further:
a. Python code is lexed and compiled into Python specific instructions, but not compiled to machine executable code. The ".pyc" file is automatically created whenever you run python code that does not match the existing .pyc timestamp. This feature can be turned off. You might play with the dis module to see these instructions.
b. When a module is imported, it is executed (top to bottom) in its own namespace and that namespace cached globally. When you import from another module, the module is not executed again. Remember that def is just a statement. You may want to put a print('compiling this module') statement in your code to trace it.
It depends.
There were recent enhancements, mostly around specifying which module needed to be loaded. Modules can have relative paths so that a huge project might have multiple modules with the a same name.
Python itself won't work for a compiled language. Google for "unladen swallow blog" to see the tribulations of trying to speed up a language where "a = sum(b)" can change meanings between executions. Outside of corner cases, the module system forms a nice bridge between source code and a compiled library system. The approach works well, and Python's easy wrapping of C code (swig, etc.) helps.
Modules are the only truly global objects in Python, with all other global data based around the module system (which uses sys.modules as a registry). Packages are simply modules with special semantics for importing submodules. "Compiling" a .py file into a .pyc or .pyo isn't compilation as understood for most languages: it only checks the syntax and creates a code object which, when executed in the interpreter, creates the module object.
example.py:
print "Creating %s module." % __name__
def show_def(f):
print "Creating function %s.%s." % (__name__, f.__name__)
return f
#show_def
def a():
print "called: %s.a" % __name__
Interactive session:
>>> import example
# first sys.modules['example'] is checked
# since it doesn't exist, example.py is found and "compiled" to example.pyc
# (since example.pyc doesn't exist, same would happen if it was outdated, etc.)
Creating example module. # module code is executed
Creating function example.a. # def statement executed
>>> example.a()
called: example.a
>>> import example
# sys.modules['example'] found, local variable example assigned to that object
# no 'Creating ..' output
>>> d = {"__name__": "fake"}
>>> exec open("example.py") in d
# the first import in this session is very similar to this
# in that it creates a module object (which has a __dict__), initializes a few
# variables in it (__builtins__, __name__, and others---packages' __init__
# modules have their own as well---look at some_module.__dict__.keys() or
# dir(some_module))
# and executes the code from example.py in this dict (or the code object stored
# in example.pyc, etc.)
Creating fake module. # module code is executed
Creating function fake.a. # def statement executed
>>> d.keys()
['__builtins__', '__name__', 'a', 'show_def']
>>> d['a']()
called: fake.a
Your questions:
They are compiled, in a sense, but not as you would expect if you're familiar with how C compilers work.
If the data is immutable, copying is feasible, and should be indistinguishable from sharing except for object identity (is operator and id() in Python).
Imports may or may not execute code (they always assign a local variable to an object, but that poses no problems) and may or may not modify sys.modules. You must be careful to not import in threads, and generally it is best to do all imports at the top of every module: this leads to a cascading graph so all the imports are done at once and then __main__ continues and does the Real Work™.
I don't know of any current PEP, but there's already a lot of complex machinery in place, too. For example packages can have a __path__ attribute (really a list of paths) so submodules don't have to be in the same directory, and these paths can even be computed at runtime! (Example mungepath package below.) You can have your own import hooks, use import statements inside functions, directly call __import__, and I wouldn't be surprised to find 2-3 other unique ways to work with packages and modules.
A subset of the import system would work in a traditionally-compiled language, as long as it was similar to something like C's #include. You could run the "first level" of execution (creating the module objects) in the compiler, and compile those results. There are significant drawbacks to this, however, and amounts to separate execution contexts for module-level code and functions executed at runtime (and some functions would have to run in both contexts!). (Remember in Python that every statement is executed at runtime, even def and class statements.)
I believe this is the main reason traditionally-compiled languages restrict "top-level" code to class, function, and object declarations, eliminating this second context. Even then, you have initialization problems for global objects in C/C++ (and others), unless managed carefully.
mungepath/__init__.py:
print __path__
__path__.append(".") # CWD, would be different in non-example code
print __path__
from . import example # this is example.py from above, and is NOT in mungepath/
# note that this is a degenerate case, in that we now have two names for the
# 'same' module: example and mungepath.example, but they're really different
# modules with different functions (use 'is' or 'id()' to verify)
Interactive session:
>>> import example
Creating example module.
Creating function example.a.
>>> example.__dict__.keys()
['a', '__builtins__', '__file__', 'show_def', '__package__',
'__name__', '__doc__']
>>> import mungepath
['mungepath']
['mungepath', '.']
Creating mungepath.example module.
Creating function mungepath.example.a.
>>> mungepath.example.a()
called: mungepath.example.a
>>> example is mungepath.example
False
>>> example.a is mungepath.example.a
False
Global data is scoped at the interpreter level.
"packages" can be compiled as a package is just a collection of modules which themselves can be compiled.
I am not sure I understand given the established scoping of data.

Python includes, module scope issue

I'm working on my first significant Python project and I'm having trouble with scope issues and executing code in included files. Previously my experience is with PHP.
What I would like to do is have one single file that sets up a number of configuration variables, which would then be used throughout the code. Also, I want to make certain functions and classes available globally. For example, the main file would include a single other file, and that file would load a bunch of commonly used functions (each in its own file) and a configuration file. Within those loaded files, I also want to be able to access the functions and configuration variables. What I don't want to do, is to have to put the entire routine at the beginning of each (included) file to include all of the rest. Also, these included files are in various sub-directories, which is making it much harder to import them (especially if I have to re-import in every single file).
Anyway I'm looking for general advice on the best way to structure the code to achieve what I want.
Thanks!
In python, it is a common practice to have a bunch of modules that implement various functions and then have one single module that is the point-of-access to all the functions. This is basically the facade pattern.
An example: say you're writing a package foo, which includes the bar, baz, and moo modules.
~/project/foo
~/project/foo/__init__.py
~/project/foo/bar.py
~/project/foo/baz.py
~/project/foo/moo.py
~/project/foo/config.py
What you would usually do is write __init__.py like this:
from foo.bar import func1, func2
from foo.baz import func3, constant1
from foo.moo import func1 as moofunc1
from foo.config import *
Now, when you want to use the functions you just do
import foo
foo.func1()
print foo.constant1
# assuming config defines a config1 variable
print foo.config1
If you wanted, you could arrange your code so that you only need to write
import foo
At the top of every module, and then access everything through foo (which you should probably name "globals" or something to that effect). If you don't like namespaces, you could even do
from foo import *
and have everything as global, but this is really not recommended. Remember: namespaces are one honking great idea!
This is a two-step process:
In your module globals.py import the items from wherever.
In all of your other modules, do "from globals import *"
This brings all of those names into the current module's namespace.
Now, having told you how to do this, let me suggest that you don't. First of all, you are loading up the local namespace with a bunch of "magically defined" entities. This violates precept 2 of the Zen of Python, "Explicit is better than implicit." Instead of "from foo import *", try using "import foo" and then saying "foo.some_value". If you want to use the shorter names, use "from foo import mumble, snort". Either of these methods directly exposes the actual use of the module foo.py. Using the globals.py method is just a little too magic. The primary exception to this is in an __init__.py where you are hiding some internal aspects of a package.
Globals are also semi-evil in that it can be very difficult to figure out who is modifying (or corrupting) them. If you have well-defined routines for getting/setting globals, then debugging them can be much simpler.
I know that PHP has this "everything is one, big, happy namespace" concept, but it's really just an artifact of poor language design.
As far as I know program-wide global variables/functions/classes/etc. does not exist in Python, everything is "confined" in some module (namespace). So if you want some functions or classes to be used in many parts of your code one solution is creating some modules like: "globFunCl" (defining/importing from elsewhere everything you want to be "global") and "config" (containing configuration variables) and importing those everywhere you need them. If you don't like idea of using nested namespaces you can use:
from globFunCl import *
This way you'll "hide" namespaces (making names look like "globals").
I'm not sure what you mean by not wanting to "put the entire routine at the beginning of each (included) file to include all of the rest", I'm afraid you can't really escape from this. Check out the Python Packages though, they should make it easier for you.
This depends a bit on how you want to package things up. You can either think in terms of files or modules. The latter is "more pythonic", and enables you to decide exactly which items (and they can be anything with a name: classes, functions, variables, etc.) you want to make visible.
The basic rule is that for any file or module you import, anything directly in its namespace can be accessed. So if myfile.py contains definitions def myfun(...): and class myclass(...) as well as myvar = ... then you can access them from another file by
import myfile
y = myfile.myfun(...)
x = myfile.myvar
or
from myfile import myfun, myvar, myclass
Crucially, anything at the top level of myfile is accessible, including imports. So if myfile contains from foo import bar, then myfile.bar is also available.

Categories