I have a module that I need to test in python.
I'm using the unittest framework but I ran into a problem.
The module has some method definitions, one of which is used when it's imported (readConfiguration) like so:
.
.
.
def readConfiguration(file = "default.xml"):
# do some reading from xml
readConfiguration()
This is a problem because when I try to import the module it also tries to run the "readConfiguration" method which fails the module and the program (a configuration file does not exist in the test environment).
I'd like to be able to test the module independent of any configuration files.
I didn't write the module and it cannot be re-factored.
I know I can include a dummy configuration file but I'm looking for a "cleaner", more elegant, solution.
As commenters have already pointed out, imports should never have side effects, so try to get the module changed if at all possible.
If you really, absolutely, cannot do this, there might be another way: let readConfiguration() be called, but stub out its dependencies. For instance, if it uses the builtin open() function, you could mock that, as demonstrated in the mock documentation:
>>> mock = MagicMock(return_value=sentinel.file_handle)
>>> with patch('builtins.open', mock):
... import the_broken_module
... # do your testing here
Replace sentinel.file_handle with StringIO("<contents of mock config file>") if you need to supply actual content.
It's brittle as it depends on the implementation of readConfiguration(), but if there really is no other way, it might be useful as a last resort.
Related
I am building a python library. The functions I want available for users are in stemmer.py. Stemmer.py uses stemmerutil.py
I was wondering whether there is a way to make stemmerutil.py not accessible to users.
If you want to hide implementation details from your users, there are two routes that you can go. The first uses conventions to signal what is and isn't part of the public API, and the other is a hack.
The convention for declaring an API within a python library is to add all classes/functions/names that should be exposed into an __all__-list of the topmost __init__.py. It doesn't do that many useful things, its main purpose nowadays is a symbolic "please use this and nothing else". Yours would probably look somewhat like this:
urdu/urdu/__init__.py
from urdu.stemmer import Foo, Bar, Baz
__all__ = [Foo, Bar, Baz]
To emphasize the point, you can also give all definitions within stemmerUtil.py an underscore before their name, e.g. def privateFunc(): ... becomes def _privateFunc(): ...
But you can also just hide the code from the interpreter by making it a resource instead of a module within the package and loading it dynamically. This is a hack, and probably a bad idea, but it is technically possible.
First, you rename stemmerUtil.py to just stemmerUtil - now it is no longer a python module and can't be imported with the import keyword. Next, update this line in stemmer.py
import stemmerUtil
with
import importlib.util
import importlib.resources
# in python3.7 and lower, this is importlib_resources and needs to be installed first
stemmer_util_spec = importlib.util.spec_from_loader("stemmerUtil", loader=None)
stemmerUtil = importlib.util.module_from_spec(stemmer_util_spec)
with importlib.resources.path("urdu", "stemmerUtil") as stemmer_util_path:
with open(stemmer_util_path) as stemmer_util_file:
stemmer_util_code = stemmer_util_file.read()
exec(stemmer_util_code, stemmerUtil.__dict__)
After running this code, you can use the stemmerUtil module as if you had imported it, but it is invisible to anyone who installed your package - unless they run this exact code as well.
But as I said, if you just want to communicate to your users which part of your package is the public API, the first solution is vastly preferable.
Short question: I have a module with objects. How can I do that if someone imports an object from my module, my specified exception is raised?
What I want to do: I write an architectural framework. A class for output depends on jinja2 external library. I want the framework to be usable without this dependency as well. In the package's __init__.py I write conditional import of my class RenderLaTeX (if jinja2 is available, the class is imported, otherwise not).
The problem with this approach is that I have some code which uses this class RenderLaTeX, but when I run it on a fresh setup, I receive an error like Import error: no class RenderLaTeX could be imported from output. This error is pretty unexpected and ununderstandable before I recall that jinja2 must be installed beforehand.
I thought about this solution: if the class is not available, __init__.py can create a string with this name. If a user tries to instantiate this object with the usual class constructor, they'll get a more meaningful error. Unfortunately, simple import
from output import RenderLaTeX
won't raise an error in this case.
What would you suggest, hopefully with the description of benefits and drawbacks?
Important UPD: I package my classes in modules and import them to the module via __init__.py, so that I import 'from lena.flow import ReadROOTFile', rather than 'from lena.flow.read_root_file import ReadROOTFile.'
When Python imports a module all of the code inside the file from which you are importing is executed.
If your RenderLaTeX class is therefore placed into a seperate file you can freely add logic which would prevent it from being imported (or run) if required dependencies are missing.
For example:
try:
import somethingidonthave
except ImportError:
raise Exception('You need this module!')
class RenderLaTeX(object):
pass
You can also add any custom message you want to the exception to better describe the error. This should work in both Python2 and Python3.
After a year of thinking, the solution appeared.
First of all, I think that this is pretty meaningless to overwrite an exception's type. The only good way would be to add a useful message for a missing import.
Second, I think that the syntax
from framework.renderers import MyRenderer
is really better than
from framework.renderers.my_renderer import MyRenderer
because it hides implementation details and requires less code from user (I updated my question to reflect that). For the former syntax to work, I have to import MyRenderer in __init__.py in the module.
Now, in my_renderer.py I would usually import third-party packages with
import huge_specific_library
in the header. This syntax is required by PEP 8. However, this would make the whole framework.renderers module depend on huge_specific_library.
The solution for that is to violate PEP 8 and import the library inside the class itself:
class MyRenderer():
def __init__(self):
import huge_specific_library
# ... use that...
Here you can catch the exception if that is important, change its message, etc. There is another benefit for that: there exist guides how to reduce import time, and they propose this solution (I read them a long time ago and forgot). Large modules require some time to be loaded. If you follow PEP 8 Style Guide (I still think that you usually should), this may lead to large delays just to make all imports to your program, not having done anything useful yet.
The only caveat is this: if you import the library in __init__, you should also import that to other class methods that use it, otherwise it won't be visible there.
For those who still doubt, I must add that since Python imports are cached, this doesn't affect performance if your method that uses import is not called too often.
Overview
I'm running some scientific simulations and I want to process the resulting data in Python. The simulation produces a custom data type that is not used outside of the chain of programs that the authors of the simulation produced, so unfortunately I need what they provide me.
They want me to install two files:
A module called sdds.py that defines a class that provides all user functions and two demos
A compiled module called sddsdatamodule.so that only provides helper functions to sdds.py.
(I find it strange that they're offering me two modules that are so inextricably connected, it doesn't seem like good coding practice to me, but using their code is probably better than rewriting things from scratch.) I'd prefer not to install them directly into my path, side by side. They come from the same company, they're designed to do one specific task together: access and manipulate SDDS-type files.
So I thought I would put them in a package. I could install that on my path, it would be self-contained, and I could easily find and uninstall or upgrade the modules from one location. Then I could hide their un-Pythonic solution in a more-Pythonic package without significantly rewriting things. Seems elegant.
Details
The package I actually use is found here:
http://www.aps.anl.gov/Accelerator_Systems_Division/Accelerator_Operations_Physics/software.shtml#PythonBinaries
Unfortunately, they only support Windows and Mac OS X right now. Compiling the source code is quite onerous, and apparently they have no significant requests for Linux/Unix. I have a Mac, so thankfully this isn't a problem for me.
So my directory tree looks like this:
SDDSPython/ My toplevel package
__init__.py Designed to only import the SDDS class
sdds.py Defines SDDS class and two demo methods
sddsdatamodule.so Defines sddsdata module used by SDDS class.
My __init__.py file literally only contains this:
from sdds import SDDS
The sdds.py file contains the class definition and the two demo definitions. The only other code in the sdds.py file is:
import sddsdata, sys, time
class SDDS:
(lots of code here)
def demo(output):
(lots of code here)
def demo2(output):
(lots of code here)
I can then import SDDSPython and check, using dir:
>>> import SDDSPython
>>> dir(SDDSPython)
['SDDS', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '__path__', 'sdds', 'sddsdata']
So I can now access the SDDS class via SDDSPython.SDDS
Question
How on earth did SDDSPython.sdds and SDDSPython.sddsdata get loaded into the SDDSPython namespace??
>>> SDDSPython.sdds
<module 'SDDSPython.sdds' from 'SDDSPython/sdds.pyc'>
>>> SDDSPython.sddsdata
<module 'SDDSPython.sddsdata' from 'SDDSPython/sddsdatamodule.so'>
I thought by creating an __init__.py file I was specifically excluding the sdds and sddsdata modules from being loaded into the SDDSPython namespace. What is going on? I can only assume this is happening due to something in the sddsdatamodule.so file? But how can a module affect its parent's namespace like that? I'm rather lost, and I don't know where to start. I've looked at the C code, but I don't see anything suspicious. To be fair- I probably don't know what something suspicious would look like, I'm probably not familiar enough with programming C extensions for Python.
Curious question--I did some investigation for you using a similar test case.
XML/
__init__.py -from indent import XMLIndentGenerator
indent.py -contains class XMLIndentGenerator, and Xml
Sink.py
It appears that importing a class from a module, even though you are importing just a portion, the entire module is accessible in the way you described, that is:
>>>import XML
>>>XML.indent
<module 'XML.indent' from 'XML\indent.py'>
>>>XML.indent.Xml #did not include this in the from
<class 'XML.indent.Xml'>
>>>XML.Sink
Traceback (most recent call last):
AttributeError:yadayada no attribute 'Sink'
This is expected, since I did not import Sink in __init__.py.....BUT!
I added a line to indent.py:
import Sink
class XMLIndentGenerator(XMLGenerator):
(code)
Now, since this class imports a module contained within the XML package, if i do:
>>>import XML
>>>XML.Sink
<module 'XML.Sink' from 'XML\Sink.pyc'>
So, it appears that because your imported sdds module also imports sddsdata, you are able to access it. That answers the "How" portion of your question, but "why" this is the case, I'm sure there's an answer somewhere in the docs :)
I hope this helps - I was literally doing this as I was typing the answer! A learning experience for me as well.
This happens because python imports don't work the way you might think. They work like this:
the import machinery looks for a file that should be the module requested from the import
a types.ModuleType instance is created, several attributes on it are set to the corresponding file (__file__, __name__ and so on), and that object is inserted into sys.modules under the fully qualified module name it would have.
if this is a submodule import (ie, sdds.py which is a submodule in SDDSPython), the newly created module is attached as an attribute to the existing python module of the parent package.
the file is "executed" with that module as its global scope; all names defined by that file appear as attributes of the module.
in the case of a from import, an attribute from the module may be returned to the importing script.
So that means if I import a module (say, foo.py) that has, as its source only:
import bar
then there is a global in foo, called bar, and I can access it as foo.bar.
There is no capacity in python for "only execute the part of this python script i want to use right now." The whole thing runs.
Is there a standard way (without installing third party libraries) to do cross platform filesystem mocking in Python? If I have to go with a third party library, which library is the standard?
pyfakefs (homepage) does what you want – a fake filesystem; it’s third-party, though that party is Google. See How to replace file-access references for a module under test for discussion of use.
For mocking, unittest.mock is the standard library for Python 3.3+ (PEP 0417); for earlier version see PyPI: mock (for Python 2.5+) (homepage).
Terminology in testing and mocking is inconsistent; using the Test Double terminology of Gerard Meszaros, you’re asking for a “fake”: something that behaves like a filesystem (you can create, open, and delete files), but isn’t the actual file system (in this case it’s in-memory), so you don’t need to have test files or a temporary directory.
In classic mocking, you would instead mock out the system calls (in Python, mock out functions in the os module, like os.rm and os.listdir), but that’s much more fiddly.
pytest is gaining a lot of traction, and it can do all of this using tmpdir and monkeypatching (mocking).
You can use the tmpdir function argument which will provide a temporary directory unique to the test invocation, created in the base temporary directory (which are by default created as sub-directories of the system temporary directory).
import os
def test_create_file(tmpdir):
p = tmpdir.mkdir("sub").join("hello.txt")
p.write("content")
assert p.read() == "content"
assert len(tmpdir.listdir()) == 1
The monkeypatch function argument helps you to safely set/delete an attribute, dictionary item or environment variable or to modify sys.path for importing.
import os
def test_some_interaction(monkeypatch):
monkeypatch.setattr(os, "getcwd", lambda: "/")
You can also pass it a function instead of using lambda.
import os.path
def getssh(): # pseudo application code
return os.path.join(os.path.expanduser("~admin"), '.ssh')
def test_mytest(monkeypatch):
def mockreturn(path):
return '/abc'
monkeypatch.setattr(os.path, 'expanduser', mockreturn)
x = getssh()
assert x == '/abc/.ssh'
# You can still use lambda when passing arguments, e.g.
# monkeypatch.setattr(os.path, 'expanduser', lambda x: '/abc')
If your application has a lot of interaction with the file system, then it might be easier to use something like pyfakefs, as mocking would become tedious and repetitive.
The standard mocking framework in Python 3.3+ is unittest.mock; you can use this for the filesystem or anything else.
You could also simply hand roll it by mocking via monkey patching:
A trivial example:
import os.path
os.path.isfile = lambda path: path == '/path/to/testfile'
A bit more full (untested):
import classtobetested
import unittest
import contextlib
#contextlib.contextmanager
def monkey_patch(module, fn_name, patch):
unpatch = getattr(module, fn_name)
setattr(module, fn_name)
try:
yield
finally:
setattr(module, fn_name, unpatch)
class TestTheClassToBeTested(unittest.TestCase):
def test_with_fs_mocks(self):
with monkey_patch(classtobetested.os.path,
'isfile',
lambda path: path == '/path/to/file'):
self.assertTrue(classtobetested.testable())
In this example, the actual mocks are trivial, but you could back them with something that has state so that can represent filesystem actions, such as save and delete. Yes, this is all a bit ugly since it entails replicating/simulating basic filesystem in code.
Note that you can't monkey patch python builtins. That being said...
For earlier versions, if at all possible use a third party library, I'd go with Michael Foord's awesome Mock, which is now unittest.mock in the standard library since 3.3+ thanks to PEP 0417, and you can get it on PyPI for Python 2.5+. And, it can mock builtins!
Faking or Mocking?
Personally, I find that there are a lot of edge cases in filesystem things (like opening the file with the right permissions, string-vs-binary, read/write mode, etc), and using an accurate fake filesystem can find a lot of bugs that you might not find by mocking. In this case, I would check out the memoryfs module of pyfilesystem (it has various concrete implementations of the same interface, so you can swap them out in your code).
Mocking (and without Monkey Patching!):
That said, if you really want to mock, you can do that easily with Python's unittest.mock library:
import unittest.mock
# production code file; note the default parameter
def make_hello_world(path, open_func=open):
with open_func(path, 'w+') as f:
f.write('hello, world!')
# test code file
def test_make_hello_world():
file_mock = unittest.mock.Mock(write=unittest.mock.Mock())
open_mock = unittest.mock.Mock(return_value=file_mock)
# When `make_hello_world()` is called
make_hello_world('/hello/world.txt', open_func=open_mock)
# Then expect the file was opened and written-to properly
open_mock.assert_called_once_with('/hello/world.txt', 'w+')
file_mock.write.assert_called_once_with('hello, world!')
The above example only demonstrates creating and writing to files via mocking the open() method, but you could just as easily mock any method.
The standard unittest.mock library has a mock_open() function which provides basic mocking of the file system.
Benefits: It's part of the standard library, and inherits the various features of Mocks, including checking call parameters & usage.
Drawbacks: It doesn't maintain filesystem state the way pytest or pyfakefs or mockfs does, so it's harder to test functions that do R/W interactions or interact with multiple files simultaneously.
I am in a project where we are starting refactoring some massive code base. One problem that immediately sprang up is that each file imports a lot of other files. How do I in an elegant way mock this in my unit test without having to alter the actual code so I can start to write unit-tests?
As an example: The file with the functions I want to test, imports ten other files which is part of our software and not python core libs.
I want to be able to run the unit tests as separately as possible and for now I am only going to test functions that does not depend on things from the files that are being imported.
Thanks for all the answers.
I didn't really know what I wanted to do from the start but now I think I know.
Problem was that some imports was only possible when the whole application was running because of some third-party auto-magic. So I had to make some stubs for these modules in a directory which I pointed out with sys.path
Now I can import the file which contains the functions I want to write tests for in my unit-test file without complaints about missing modules.
If you want to import a module while at the same time ensuring that it doesn't import anything, you can replace the __import__ builtin function.
For example, use this class:
class ImportWrapper(object):
def __init__(self, real_import):
self.real_import = real_import
def wrapper(self, wantedModules):
def inner(moduleName, *args, **kwargs):
if moduleName in wantedModules:
print "IMPORTING MODULE", moduleName
self.real_import(*args, **kwargs)
else:
print "NOT IMPORTING MODULE", moduleName
return inner
def mock_import(self, moduleName, wantedModules):
__builtins__.__import__ = self.wrapper(wantedModules)
try:
__import__(moduleName, globals(), locals(), [], -1)
finally:
__builtins__.__import__ = self.real_import
And in your test code, instead of writing import myModule, write:
wrapper = ImportWrapper(__import__)
wrapper.mock_import('myModule', [])
The second argument to mock_import is a list of module names you do want to import in inner module.
This example can be modified further to e.g. import other module than desired instead of just not importing it, or even mocking the module object with some custom object of your own.
If you really want to muck around with the python import mechanism, take a look at the ihooks module. It provides tools for changing the behavior of the __import__ built-in. But it's not clear from your question why you need to do this.
"imports a lot of other files"? Imports a lot of other files that are part of your customized code base? Or imports a lot of other files that are part of the Python distribution? Or imports a lot of other open source project files?
If your imports don't work, you have a "simple" PYTHONPATH problem. Get all of your various project directories onto a PYTHONPATH that you can use for testing. We have a rather complex path, in Windows we manage it like this
#set Part1=c:\blah\blah\blah
#set Part2=c:\some\other\path
#set that=g:\shared\stuff
set PYTHONPATH=%part1%;%part2%;%that%
We keep each piece of the path separate so that we (a) know where things come from and (b) can manage change when we move things around.
Since the PYTHONPATH is searched in order, we can control what gets used by adjusting the order on the path.
Once you have "everything", it becomes a question of trust.
Either
you trust something (i.e., the Python code base) and just import it.
Or
You don't trust something (i.e., your own code) and you
test it separately and
mock it for stand-alone testing.
Would you test the Python libraries? If so, you've got a lot of work. If not, then, you should perhaps only mock out the things you're actually going to test.
No difficult manipulation is necessary if you want a quick-and-dirty fix before your unit-tests.
If the unit tests are in the same file as the code you wish to test, simply delete unwanted module from the globals() dictionary.
Here is a rather lengthy example: suppose you have a module impp.py with contents:
value = 5
Now, in your test file you can write:
>>> import impp
>>> print globals().keys()
>>> def printVal():
>>> print impp.value
['printVal', '__builtins__', '__file__', 'impp', '__name__', '__doc__']
Note that impp is among the globals, because it was imported. Calling the function printVal that uses impp module still works:
>>> printVal()
5
But now, if you remove impp key from globals()...
>>> del globals()['impp']
>>> print globals().keys()
['printVal', '__builtins__', '__file__', '__name__', '__doc__']
...and try to call printVal(), you'll get:
>>> printVal()
Traceback (most recent call last):
File "test_imp.py", line 13, in <module>
printVal()
File "test_imp.py", line 5, in printVal
print impp.value
NameError: global name 'impp' is not defined
...which is probably exactly what you're trying to achieve.
To use it in your unit-tests, you can delete the globals just before running the test suite, e.g. in __main__:
if __name__ == '__main__':
del globals()['impp']
unittest.main()
In your comment above, you say you want to convince python that certain modules have already been imported. This still seems like a strange goal, but if that's really what you want to do, in principle you can sneak around behind the import mechanism's back, and change sys.modules. Not sure how this'd work for package imports, but should be fine for absolute imports.