Mocking disk-out-of-space in python unittests - python

I'm trying to write a unittest to test the behaviour of a function when the disk is full. I need file access functions to behave normally while most of the test runs, so that the file I'm creating is actually created, then at one point I need the disk to be 'full'. I can't find a way to do this using mock_open(), since the file object created by this doesn't seem to be persist between function calls. I've tried to use pyfakefs and setting the disk size using self.fs.set_disk_usage(MAX_FS_SIZE) but when I try to run this in my tests, it allows used_size to go negative, meaning there is always free space (though oddly, their example code works correctly).
Is there a way to either simulate a disk-out-space error at a particular point in my code? Mocking the write function to have a side-effect would be my immediate thought, but I can't access the file object that I'm writing to in my test code, as it's buried deep inside function calls.
Edit: looks like I've found a bug in pyfakefs
Edit2: bug in pyfakefs has been fixed; now works as expected. Still interested to know if there's a way to get f.write() to throw an OSError with a simple mock.

Related

How do I properly save state in case of an exception in python?

I want to
load data from files,
work on that data,
and eventually save that data back to files.
However, since step 2 may take several hours I want to make sure that progress is saved in case of an unexpected exception.
The data is loaded into an object to make it easy to work with it.
First thing that came to my mind was to turn that objects class into a context manager and use the with-statement. However I'd have to write practically my entire program within that with-statement and that doesn't feel right.
So I had a look around and found this question which essentially asks for the same thing. Among the answers this one suggesting weakref.finalize seemed the most promising to me. However, there is a note at the bottom of the documentation that says:
Note: It is important to ensure that func, args and kwargs do not own any references to obj, either directly or indirectly, since otherwise obj will never be garbage collected. In particular, func should not be a bound method of obj.
Since I'd want to save fields of that object, I'd reference them, running right into this problem.
Does this mean that the objects __exit__-function will never be called or that it will not be called until the program crashes/exits?
What is the pythonic way of doing this?
It's a little hacky, and you still wrap your code, but I normally just wrap main() in a try except block.
Then you can handle except with a pdb.set_trace(), which means whatever exception you get, your program will drop into an interactive terminal instead of breaking.
After that, you can manually inspect the error and dump any processed data into a pickle or whatever you want to do. Once you fix the bug, setup your code to read the pickle and pick up from where it left off.

Is there an easy way to find what function is calling another function in Python?

I'm not too well-versed in Python, although, I think this question is mostly language-agnostic, but wanted to see if there was a particular way to do it for Python.
I'm working with a couple of Pytorch Python scripts.
The function test_qtensor_cpu on line 147 in https://github.com/pytorch/pytorch/blob/fbcode/warm/test/quantization/core/test_quantized_tensor.py is being executed when I run the script in https://github.com/pytorch/pytorch/blob/fbcode/warm/test/test_quantization.py, but I cannot find the function that calls test_qtensor_cpu. I did a grep -ri "test_qtensor_cpu*" . in the root director of this repo and the only result was the definition of this function.
Is there a way for this function to be called without explicitly writing out the function's name?
Just add:
def my_func_i_cant_figure_out_whats_calling_it()
import traceback
traceback.print_stack()
That will show you the callstack at that point, even without a breakpoint.
Yes, its possible to call a function without explicitly writing it out.
(and figuring out how to use a debugger is super useful... future you will thank you if you figure it out sooner rather than later)
Line 29 of test_quantization.py imports TestQuantizedTensor (which includes the test_qtensor_cpu method).
The run_tests() (source here) at the bottom of the file will automatically run all test cases that have been imported (which includes TestQuantitizedTensor) via unittest (usually via unittest.main, though this can be changed by args passed to the test suite).

Python mocking delete

I practice TDD but I have not used mocking before.
Suppose I want to build a function that should create a folder, but only if that folder does not already exist. As part of my TDD cycle I first want to create a test to see that my function won’t delete an already existing folder.
As my function will probably use os.rm, I gather I could use mocking to see whether os.rm has been called or not. But this isn’t very satisfactory as there are many ways to delete folders. What if I change my function later on to use shutil.rmtree? os.rm would not have been called, but perhaps the function now incorrectly removes the folder.
Is it possible to use mocking in a way which is insensitive to the method? (without actually creating files on my machine and seeing whether they are deleted or not - what I have been doing until now)
I can think of 2 options:
You wrap the deletion in a function, e.g., delete_folder; in the tests, you mock the function and check whether it has been called.
You use pyfakefs to mock the entire filesystem. I haven't used it yet, but it seems to be very powerful.
The problem of "mockism" is that tests bind your code to a particular implementation. Once you have decided to test for a particular function call you have to call (or not as in your example) that function in your production code.
As you have already noticed, there is plenty of ways to remove the directory (even by running rm -rf as external process).
I think the way you are doing it already is the best - you check for an actual side-effect you are interested, no matter how it has been generated.
If you are wondering about performance, you may try to make that test optional, and run it less frequently than the rest of your test suite.

How to tell whether a Python function with dependencies has changed?

tl;dr:
How can I cache the results of a Python function to disk and in a later session use the cached value if and only if the function code and all of its dependencies are unchanged since I last ran it?
In other words, I want to make a Python caching system that automatically watches out for changed code.
Background
I am trying to build a tool for automatic memoization of computational results from Python. I want the memoization to persist between Python sessions (i.e. be reusable at a later time in another Python instance, preferrably even on another machine with the same Python version).
Assume I have a Python module mymodule with some function mymodule.func(). Let's say I already solved the problem of serializing/identifying the function arguments, so we can assume that mymodule.func() takes no arguments if it simplifies anything.
Also assume that I guarantee that the function mymodule.func() and all its dependencies are deterministic, so mymodule.func() == mymodule.func().
The task
I want to run the function mymodule.func() today and save its results (and any other information necessary to solve this task). When I want the same result at a later time, I would like to load the cached result instead of running mymodule.func() again, but only if the code in mymodule.func() and its dependencies are unchanged.
To simplify things, we can assume that the function is always run in a freshly started Python interpreter with a minimal script like this:
import some_save_function
import mymodule
result = mymodule.func()
some_save_function(result, 'filename')
Also, note that I don't want to be overly conservative. It is probably not too hard to use the modulefinder module to find all modules involved when running the first time, and then not use the cache if any module has changed at all. But this defeats my purpose, because in my use case it is very likely that some unrelated function in an imported module has changed.
Previous work and tools I have looked at
joblib memoizes results tied to the function name, and also saves the source code so we can check if it is unchanged. However, as far as I understand it does not check upstream functions (called by mymodule.func()).
The ast module gives me the Abstract Syntax Tree of any Python code, so I guess I can (in principle) figure it all out that way. How hard would this be? I am not very familiar with the AST.
Can I use any of all the black magic that's going on inside dill?
More trivia than a solution: IncPy, a finished/deceased research project, implemented a Python interpreter doing this by default, always. Nice idea, but never made it outside the lab.
Grateful for any input!

Unit testing file write in Python

I am writing a wrapper for the ConfigParser in Python to provide an easy interface for storing and retrieving application settings.
The wrapper has two methods, read and write, and a set of properties for the different application settings.
The write method is just a wrapper for the ConfigParser's write method with the addition of also creating the file object needed by the ConfigParser. It looks like this:
def write(self):
f = open(self.path, "w")
try:
self.config_parser.write(f)
finally:
f.close()
I would like to write a unit test that asserts that this method raises an IOError if the file could not be written to and in the other case that the write method of the config parser was called.
The second test is quite easy to handle with a mock object. But the open call makes things a little tricky. Eventually I have to create a file object to pass to the config parser. The fact that a file will actually be created when running this code doesn't make it very useful for a unit test. Is there some strategy for mocking file creation? Can this piece of code be tested in some way? Or is it just too simple to be tested?
First, you don't actually need to unit test open(), since it's pretty reasonable to assume that the standard library is correct.
Next, you don't want to do file system manipulations to get open() to generate the error you want, because then you're not unit testing, you're doing a functional/integration test by including the file system.
So you could perhaps replace open() in the global namespace with a surrogate that just raises an IOError. Though, probably need to make sure you put things back if execution continues.
But in the end, what value does the test have? There's so little in that code snippet that's your own system. Even replacing open() really just ends up being a test that says "does the try and finally statement in Python work?"
My suggestion? Just add a statement to the docstring that records your expectation. "Raises an IOError if the file can't be written." Then move on. You can add a unit test later if this method gains some complexity (and merit for testing).
Actually, only open could throw an exception in your code. The docs for write() doesn't say anything about exceptions. Possibly only a ValueError or something for a bad file pointer (as a result of open failing, which can't be the case here).
Making an IOError for open is easy. Just create the file elsewhere and open it for writing there. Or you could change the permissions for it so you don't have access.
You'd might wanna use the with statement here though, and it'll handle the closing itself.
In python 2.5 you need the first line. In later versions you don't need it.
from __future__ import with_statement # python 2.5 only
def write(self):
with open(self.path, 'w') as f:
self.config_parser.write(f)
The write method is guaranteed to be called if open succeeds, and won't be called if open raises an IOError. I don't know why you'd need a test to see if write was called. The code says that it does. Don't overdo your testing. ;)
Remember you don't have to test that open() or ConfigParser work—they're not part of your code—you just have to test that you use them correctly. You can monkeypatch the module with your own open(), just as for the instance attribute, and can return a mock from it that helps you test.
However, unit tests are not my only tool, and this is one function that's simple enough to analyze and "prove"† that it works.
†Less rigorously than mathematicians would like, I'm sure, but good enough for me.

Categories