Unit testing file write in Python

Unit testing file write in Python - python

I am writing a wrapper for the ConfigParser in Python to provide an easy interface for storing and retrieving application settings.
The wrapper has two methods, read and write, and a set of properties for the different application settings.
The write method is just a wrapper for the ConfigParser's write method with the addition of also creating the file object needed by the ConfigParser. It looks like this:
def write(self):
f = open(self.path, "w")
try:
self.config_parser.write(f)
finally:
f.close()
I would like to write a unit test that asserts that this method raises an IOError if the file could not be written to and in the other case that the write method of the config parser was called.
The second test is quite easy to handle with a mock object. But the open call makes things a little tricky. Eventually I have to create a file object to pass to the config parser. The fact that a file will actually be created when running this code doesn't make it very useful for a unit test. Is there some strategy for mocking file creation? Can this piece of code be tested in some way? Or is it just too simple to be tested?

First, you don't actually need to unit test open(), since it's pretty reasonable to assume that the standard library is correct.
Next, you don't want to do file system manipulations to get open() to generate the error you want, because then you're not unit testing, you're doing a functional/integration test by including the file system.
So you could perhaps replace open() in the global namespace with a surrogate that just raises an IOError. Though, probably need to make sure you put things back if execution continues.
But in the end, what value does the test have? There's so little in that code snippet that's your own system. Even replacing open() really just ends up being a test that says "does the try and finally statement in Python work?"
My suggestion? Just add a statement to the docstring that records your expectation. "Raises an IOError if the file can't be written." Then move on. You can add a unit test later if this method gains some complexity (and merit for testing).

Actually, only open could throw an exception in your code. The docs for write() doesn't say anything about exceptions. Possibly only a ValueError or something for a bad file pointer (as a result of open failing, which can't be the case here).
Making an IOError for open is easy. Just create the file elsewhere and open it for writing there. Or you could change the permissions for it so you don't have access.
You'd might wanna use the with statement here though, and it'll handle the closing itself.
In python 2.5 you need the first line. In later versions you don't need it.
from __future__ import with_statement # python 2.5 only
def write(self):
with open(self.path, 'w') as f:
self.config_parser.write(f)
The write method is guaranteed to be called if open succeeds, and won't be called if open raises an IOError. I don't know why you'd need a test to see if write was called. The code says that it does. Don't overdo your testing. ;)

Remember you don't have to test that open() or ConfigParser work—they're not part of your code—you just have to test that you use them correctly. You can monkeypatch the module with your own open(), just as for the instance attribute, and can return a mock from it that helps you test.
However, unit tests are not my only tool, and this is one function that's simple enough to analyze and "prove"† that it works.
†Less rigorously than mathematicians would like, I'm sure, but good enough for me.

Related

How do I properly save state in case of an exception in python?

I want to
load data from files,
work on that data,
and eventually save that data back to files.
However, since step 2 may take several hours I want to make sure that progress is saved in case of an unexpected exception.
The data is loaded into an object to make it easy to work with it.
First thing that came to my mind was to turn that objects class into a context manager and use the with-statement. However I'd have to write practically my entire program within that with-statement and that doesn't feel right.
So I had a look around and found this question which essentially asks for the same thing. Among the answers this one suggesting weakref.finalize seemed the most promising to me. However, there is a note at the bottom of the documentation that says:
Note: It is important to ensure that func, args and kwargs do not own any references to obj, either directly or indirectly, since otherwise obj will never be garbage collected. In particular, func should not be a bound method of obj.
Since I'd want to save fields of that object, I'd reference them, running right into this problem.
Does this mean that the objects __exit__-function will never be called or that it will not be called until the program crashes/exits?
What is the pythonic way of doing this?

It's a little hacky, and you still wrap your code, but I normally just wrap main() in a try except block.
Then you can handle except with a pdb.set_trace(), which means whatever exception you get, your program will drop into an interactive terminal instead of breaking.
After that, you can manually inspect the error and dump any processed data into a pickle or whatever you want to do. Once you fix the bug, setup your code to read the pickle and pick up from where it left off.

Python - how to use except like PermissionError correct?

I would like to ask how to solve such a situation according to the art of Python.
I have an extentions_and_directories module which includes this function, among others:
def create_dir_if_not_exits(url):
path = pathlib.Path(url)
path.mkdir(parents=True, exist_ok=True)
If I try to create a subfolder of a folder for which I do not have permission I get a PermissionError.
I could catch him at this stage, but I don't really know what to do with him then. Raise another exception?
Returning false - this is probably not a good solution in terms of correct encoding.
So I use this function, for example, in the Project class in the method of the create_project class.
This function in a nutshell creates a project file in the appropriate directory that, if necessary
must create. Simplifying it to what we are interested in, it looks like this:
def create_project(directory, filename):
#code[...]
create_dir_if_not_exits(url)
#code[...]
This class is only part of the modules that helps organize the work, so it doesn't pay off
no GUI or print messages. It is only supposed to be a tool that I will use
in other modules and classes. For example, I will use it like this (pseudo code here)
1. Open the dialog for the user to choose where to save the file.
2. create_project(directory, filename)
3. Save the file in the created directory.
How to properly behave in a situation?
Is it correct, if only at the last stage of the code (written as pseudo code in this case,
I will add a try and the appropriate behavior in the form of returning a message in the form of a GUI and ordering it again
file selection, or according to the art, should I handle this exception somehow, at an earlier stage?
Theoretically, a developer using the create_project method in his project won't get it this way
an exception in my create_project method, only in one of the lines of the create_dir_if_not_exits function,
which in turn is one line of create_project code. So it is not - as I understand it - full
this term means an exception to the create_project method. Is not this a problem and will not stay
it is perceived as improper practice, misleading the user of the method, or else
the previous function create_dir_if_not_exits, is this a correct practice and it should be so?

Mocking disk-out-of-space in python unittests

I'm trying to write a unittest to test the behaviour of a function when the disk is full. I need file access functions to behave normally while most of the test runs, so that the file I'm creating is actually created, then at one point I need the disk to be 'full'. I can't find a way to do this using mock_open(), since the file object created by this doesn't seem to be persist between function calls. I've tried to use pyfakefs and setting the disk size using self.fs.set_disk_usage(MAX_FS_SIZE) but when I try to run this in my tests, it allows used_size to go negative, meaning there is always free space (though oddly, their example code works correctly).
Is there a way to either simulate a disk-out-space error at a particular point in my code? Mocking the write function to have a side-effect would be my immediate thought, but I can't access the file object that I'm writing to in my test code, as it's buried deep inside function calls.
Edit: looks like I've found a bug in pyfakefs
Edit2: bug in pyfakefs has been fixed; now works as expected. Still interested to know if there's a way to get f.write() to throw an OSError with a simple mock.

How does using the try statement avoid a race condition?

When determining whether or not a file exists, how does using the try statement avoid a "race condition"?
I'm asking because a highly upvoted answer (update: it was deleted) seems to imply that using os.path.exists() creates an opportunity that would not exist otherwise.
The example given is:
try:
with open(filename): pass
except IOError:
print 'Oh dear.'
But I'm not understanding how that avoids a race condition compared to:
if not os.path.exists(filename):
print 'Oh dear.'
How does calling os.path.exists(filename) allow the attacker to do something with the file that they could not already do?

The race condition is, of course, between your program and some other code that operates on file (race condition always requires at least two parallel processes or threads, see this for details). That means using open() instead of exists() may really help only in two situations:
You check for existence of a file that is created or deleted by some background process (however, if you run inside a web server, that often means there are many copies of your process running in parallel to process HTTP requests, so for web apps race condition is possible even if there are no other programs).
There may be some malicious program running that is trying to crash your code by destroying the file at the moments you expect it to exist.
exists() just performs a single check. If file exists, it may be deleted a microsecond after exists() returned True. If file is absent, it may be created immediately.
However, open() not just tests for file existence, but also opens the file (and does these two actions atomically, so nothing can happen between the check and the opening). Usually files can not be deleted while they are open by someone. That means that inside with you may be completely sure: file really exists now since it is open. Though it's true only inside with, and the file still may be deleted immediately after with block exits, putting code that needs file to exist inside with guarantees that code will not fail.

Here's an example of usage:
try:
with open('filename') as f:
do_stuff_that_depends_on_the_existence_of_the_file(f)
except IOError as e:
print 'Trouble opening file'
If you are opening the file with any access at all, then the OS will guarantee that the file exists, or else it will fail with an error. If the access is exclusive, any other process in contention for the file will either be blocked by you, or block you.
The try is just a way to detect the error or success of the act of opening the file, since file I/O APIs in Python typically do not have return codes (exceptions are used instead). So to really answer your question, it's not the try that avoids the race condition, it's the open. It's basically the same in C (on which Python is based), but without exceptions. Read this for more information.
Note that you would probably want to execute code that depends on access to the file inside the try block. Once you close the file, its existence is no longer guaranteed.
Calling os.path.exists merely gives a snapshot at a moment in time when the file may or may not exist, and you have no knowledge of the existence of the file once os.path.exists returns. Malevolent code or unexpected logic may delete or change the file when you are not expecting it. It is akin to turning your head to check that a road is clear before driving into it. Once you turn your head back, you have nothing but a guess about what is going on where you are no longer looking. Holding the file open guarantees an extended consistent state, something not possible (for good or ill) when driving. :)
Your suggestion of checking that a file does not exist rather than using try/open is still insufficient because of the snapshot nature of os.path.exists. Unfortunately I know of no way to prevent files from being created in a directory in all cases, so I think it is best to check for the positive existence of a file, rather than its absence.

I think what you're asking is the particular race condition where:
file is opened
context is switched and the file is deleted
context is switched back and file operations are attempted on the "opened" file
The way you're "protected" in this case is by putting all the file handling code in a try block, if at any point the file becomes inaccessible/corrupt your file operations will be able to fail "gracefully" via the catch block.
Note of course modern OS's this can't happen anyway, when a file is "deleted" the delete won't take place until all open handles on the file are resolved (released)

Method logging in Python

I'd like something equivalent to
calling method: $METHOD_NAME
args: $ARGS
output: $OUTPUT
to be automatically logged to a file (via the logging module, possibly) for every (user-defined) method call. The best solution I can come up with is to write a decorator that will do this, and then add it to every function. Is there a better way?
Thanks

You could look at the trace module in the standard library, which
allows you to trace program execution, generate annotated statement coverage listings, print caller/callee relationships and list functions executed during a program run. It can be used in another program or from the command line.
You can also log to disk:
import sys
import trace
# create a Trace object, telling it what to ignore, and whether to
# do tracing or line-counting or both.
tracer = trace.Trace(
ignoredirs=[sys.prefix, sys.exec_prefix],
trace=0,
count=1)
# run the new command using the given tracer
tracer.run('main()')
# make a report, placing output in /tmp
r = tracer.results()
r.write_results(show_missing=True, coverdir="/tmp")

One approach that might simplify things a bit would be to use a metaclass to automatically apply your decorator for you. It'd cut down on the typing at the expense of requiring you to delve into the arcane and mysterious world of metaclass programming.

It depends how exactly are you going to use it.
Most generic approach would be to follow stdlib's 'profile' module path and therefore have control over each call, but its somewhat slow.
If you know which modules you need to track before giving them control, I'd go with iterating over all their members and wrapping with tracking decorator. This way tracked code stays clean and it doesn't take too much coding to implement.

A decorator would be a simple approach for a smaller project, however with decorators you have to be careful about passing arguments to make sure that they don't get mangled on their way through. A metaclass would probably be more of the "right" way to do it without having to worry about adding decorators to every new method.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.