Python mocking delete - python

I practice TDD but I have not used mocking before.
Suppose I want to build a function that should create a folder, but only if that folder does not already exist. As part of my TDD cycle I first want to create a test to see that my function won’t delete an already existing folder.
As my function will probably use os.rm, I gather I could use mocking to see whether os.rm has been called or not. But this isn’t very satisfactory as there are many ways to delete folders. What if I change my function later on to use shutil.rmtree? os.rm would not have been called, but perhaps the function now incorrectly removes the folder.
Is it possible to use mocking in a way which is insensitive to the method? (without actually creating files on my machine and seeing whether they are deleted or not - what I have been doing until now)

I can think of 2 options:
You wrap the deletion in a function, e.g., delete_folder; in the tests, you mock the function and check whether it has been called.
You use pyfakefs to mock the entire filesystem. I haven't used it yet, but it seems to be very powerful.

The problem of "mockism" is that tests bind your code to a particular implementation. Once you have decided to test for a particular function call you have to call (or not as in your example) that function in your production code.
As you have already noticed, there is plenty of ways to remove the directory (even by running rm -rf as external process).
I think the way you are doing it already is the best - you check for an actual side-effect you are interested, no matter how it has been generated.
If you are wondering about performance, you may try to make that test optional, and run it less frequently than the rest of your test suite.

Related

Mocking disk-out-of-space in python unittests

I'm trying to write a unittest to test the behaviour of a function when the disk is full. I need file access functions to behave normally while most of the test runs, so that the file I'm creating is actually created, then at one point I need the disk to be 'full'. I can't find a way to do this using mock_open(), since the file object created by this doesn't seem to be persist between function calls. I've tried to use pyfakefs and setting the disk size using self.fs.set_disk_usage(MAX_FS_SIZE) but when I try to run this in my tests, it allows used_size to go negative, meaning there is always free space (though oddly, their example code works correctly).
Is there a way to either simulate a disk-out-space error at a particular point in my code? Mocking the write function to have a side-effect would be my immediate thought, but I can't access the file object that I'm writing to in my test code, as it's buried deep inside function calls.
Edit: looks like I've found a bug in pyfakefs
Edit2: bug in pyfakefs has been fixed; now works as expected. Still interested to know if there's a way to get f.write() to throw an OSError with a simple mock.

Monkeypatch persisting across unit tests python

I have a custom framework which runs different code for different clients. I have monkeypatched certain methods in order to customize functionality for a client.
Here is the pattern simplified:
#import monkeypatches here
if self.config['client'] == 'cool_dudes':
from app.monkeypatches import Stuff
if self.config['client'] == 'cool_dudettes':
from app.monkeypatches import OtherStuff
Here is an example patch:
from app.framework.stuff import Stuff
def function_override(self):
return pass
Stuff.function = function_override
This works fine when the program executes as it is executed in a batch manner, spinning up from scratch every time. However, when running across unit tests, I find that the monkey patches persist across tests, causing unexpected behavior.
I realize that it would be far better to use an object oriented inheritance approach to these overrides, but I inherited this codebase and am not currently empowered to rearchitect it to that degree.
Barring properly re-architecting the program, how can I prevent these monkey patches from persisting across unit tests?
The modules, including app.framework.<whatever>, are not reloaded for every test. So, any changes in them you make persist. The same happens if your module is stateful (that's one of the reasons why global state is not such a good idea, you should rather keep state in objects).
Your options are to:
undo the monkey-patches when needed, or
change them into something more generic that would change (semi-)automatically depending on the test running, or
(preferred) Do not reinvent the wheel and use an existing, manageable, time-proven solution for your task (or at least, base your work on one if it doesn't meet your requirements completely). E.g. if you use them for mocking, see How can one mock/stub python module like urllib . Among the suggestions there is #mock.patch that does the patching for a specific test and undoes it upon its completion.
Anyone coming here looking for information about monkeypatching, might want to have a look at pytest's monkeypatch fixture. It avoids the problem of the OP by automatically undoing all modifications after the test function has finished.

Test os methods

How would one test the os module methods provided in python. For example how would you test the use of os.mkdir?
def create_folder(self):
os.mkdir("/parentFolder/newFolder")
What can be used to test this method?
This method would have test cases such as
Verifying the folder was created
Insufficient permissions to create folder
Insufficient space to create folder
When unit-testing create_folder, you don't test os.mkdir. This is for two reasons:
It is part of an external library (in this case the standard library, but the same would be the case for third-party libraries), so should be covered by the test suites for that library; and
Even if it was part of your codebase, it's a different unit to the one under test.
Additionally, it's worth noting that your testing of this, as demonstrated by user2393256's answer, would likely be using other functionality from the same external library - if the test fails, do you conclude that os.mkdir didn't work or that os.path.isdir didn't work?
From the perspective of testing create_folder, what really matters is that it interacts with that function correctly. I would mock os.mkdir out (using e.g. unittest.mock) and check that it is being called with the appropriate path. You can also change the return value and side effects of the mock, allowing you to simulate things like insufficient permissions or space, and test your app's response to that, without having to somehow set up that environment for real. When testing other units of functionality that call create_folder I would then mock out create_folder entirely, as it's a tested and trusted unit.
Beyond the unit testing, you would have a level of integration testing, which makes sure that all parts of your application work together correctly. At this point you would test overall functionality, e.g. that you can save a file then later load it back in, rather than specifics like "is the folder created?"
Finally, and specifically for standard library functionality, you have to have a certain amount of trust that the language itself is tested (even if not directly, at least by the thousands of programs out there using this already!) and working.
To check if the directory was created you can use
os.path.isdir()
As for the permission: there is a python idiom which says
It's easier to ask for forgiveness than for permission
In that case i would try to create the folder and catch the exception that could be thrown.

How to unit test helper methods in Python scheduled job?

I've read quite a few answers on here about testing helper methods (not necessarily private) in my unit tests and I'm still not quite sure what the best approach should be for my current situation.
I currently have a block of logic that runs as a scheduled job. It does a number of mostly related things like update local repositories, convert file types, commit these to other repos, clean up old repos, etc. I need all of this code to run in a specific order, so rather than setting a bunch of scheduled jobs, I took a lot of these small methods and put them into one large method that would enforce the order in which the code is run:
def mainJob():
sync_repos()
convert_files()
commit_changes()
and so on. Now I'm not sure how to write my tests for this thing. It's frustrating to test the entire mainJob() function because it does so many things and is really more of a reliability feature anyway. I see a lot of people saying I should only test the public interface, but I worry that there will potentially be code that isn't directly verified.

How to tell whether a Python function with dependencies has changed?

tl;dr:
How can I cache the results of a Python function to disk and in a later session use the cached value if and only if the function code and all of its dependencies are unchanged since I last ran it?
In other words, I want to make a Python caching system that automatically watches out for changed code.
Background
I am trying to build a tool for automatic memoization of computational results from Python. I want the memoization to persist between Python sessions (i.e. be reusable at a later time in another Python instance, preferrably even on another machine with the same Python version).
Assume I have a Python module mymodule with some function mymodule.func(). Let's say I already solved the problem of serializing/identifying the function arguments, so we can assume that mymodule.func() takes no arguments if it simplifies anything.
Also assume that I guarantee that the function mymodule.func() and all its dependencies are deterministic, so mymodule.func() == mymodule.func().
The task
I want to run the function mymodule.func() today and save its results (and any other information necessary to solve this task). When I want the same result at a later time, I would like to load the cached result instead of running mymodule.func() again, but only if the code in mymodule.func() and its dependencies are unchanged.
To simplify things, we can assume that the function is always run in a freshly started Python interpreter with a minimal script like this:
import some_save_function
import mymodule
result = mymodule.func()
some_save_function(result, 'filename')
Also, note that I don't want to be overly conservative. It is probably not too hard to use the modulefinder module to find all modules involved when running the first time, and then not use the cache if any module has changed at all. But this defeats my purpose, because in my use case it is very likely that some unrelated function in an imported module has changed.
Previous work and tools I have looked at
joblib memoizes results tied to the function name, and also saves the source code so we can check if it is unchanged. However, as far as I understand it does not check upstream functions (called by mymodule.func()).
The ast module gives me the Abstract Syntax Tree of any Python code, so I guess I can (in principle) figure it all out that way. How hard would this be? I am not very familiar with the AST.
Can I use any of all the black magic that's going on inside dill?
More trivia than a solution: IncPy, a finished/deceased research project, implemented a Python interpreter doing this by default, always. Nice idea, but never made it outside the lab.
Grateful for any input!

Categories