Assert that an object can be serialized using joblib

Assert that an object can be serialized using joblib - python

I want to test whether an object can be serialized using joblib(!). Something like:
assert pickle.dumps(my_obj)
seems to be the way using pickle but joblib doesn't provide .dumps. I tried to do:
with tempfile.TemporaryFile("wb") as f:
assert joblib.dump(my_obj, f)
But this fails because joblib.dump returns None in this case (although according to the doc it should return something which evaluates to True).
What would be the equivalent if I'm using joblib?

According to the source, nothing is returned if you pass in a file object, only if you pass in a file name. https://github.com/joblib/joblib/blob/master/joblib/numpy_pickle.py#L510
So using a named temp file and passing on the name should do the trick.
Running the code and doing the assert on the file size seems also a valid strategy.

Related

Mocking a file containing JSON data with mock.patch and mock_open

I'm trying to test a method that requires the use of json.load in Python 3.6.
And after several attempts, I tried running the test "normally" (with the usual unittest.main() from the CLI), and in the iPython REPL.
Having the following function (simplified for purpose of the example)
def load_metadata(name):
with open("{}.json".format(name)) as fh:
return json.load(fh)
with the following test:
class test_loading_metadata(unittest2.TestCase):
#patch('builtins.open', new_callable=mock_open(read_data='{"disabled":True}'))
def test_load_metadata_with_disabled(self, filemock):
result = load_metadata("john")
self.assertEqual(result,{"disabled":True})
filemock.assert_called_with("john.json")
The result of the execution of the test file, yields a heart breaking:
TypeError: the JSON object must be str, bytes or bytearray, not 'MagicMock'
While executing the same thing in the command line, gives a successful result.
I tried in several ways (patching with with, as decorator), but the only thing that I can think of, is the unittest library itself, and whatever it might be doing to interfere with mock and patch.
Also checked versions of python in the virtualenv and ipython, the versions of json library.
I would like to know why what looks like the same code, works in one place
and doesn't work in the other.
Or at least a pointer in the right direction to understand why this could be happening.

json.load() simply calls fh.read(), but fh is not a mock_open() object. It's a mock_open()() object, because new_callable is called before patching to create the replacement object:
>>> from unittest.mock import patch, mock_open
>>> with patch('builtins.open', new_callable=mock_open(read_data='{"disabled":True}')) as filemock:
... with open("john.json") as fh:
... print(fh.read())
...
<MagicMock name='open()().__enter__().read()' id='4420799600'>
Don't use new_callable, you don't want your mock_open() object to be called! Just pass it in as the new argument to #patch() (this is also the second positional argument, so you can leave off the new= here):
#patch('builtins.open', mock_open(read_data='{"disabled":True}'))
def test_load_metadata_with_disabled(self, filemock):
at which point you can call .read() on it when used as an open() function:
>>> with patch('builtins.open', mock_open(read_data='{"disabled":True}')) as filemock:
... with open("john.json") as fh:
... print(fh.read())
...
{"disabled":True}
The new argument is the object that'll replace the original when patching. If left to the default, new_callable() is used instead. You don't want new_callable() here.

Wrapping urllib3.HTTPResponse in io.TextIOWrapper

I use AWS boto3 library which returns me an instance of urllib3.response.HTTPResponse. That response is a subclass of io.IOBase and hence behaves as a binary file. Its read() method returns bytes instances.
Now, I need to decode csv data from a file received in such a way. I want my code to work on both py2 and py3 with minimal code overhead, so I use backports.csv which relies on io.IOBase objects as input rather than on py2's file() objects.
The first problem is that HTTPResponse yields bytes data for CSV file, and I have csv.reader which expects str data.
>>> import io
>>> from backports import csv # actually try..catch statement here
>>> from mymodule import get_file
>>> f = get_file() # returns instance of urllib3.HTTPResponse
>>> r = csv.reader(f)
>>> list(r)
Error: iterator should return strings, not bytes (did you open the file in text mode?)
I tried to wrap HTTPResponse with io.TextIOWrapper and got error 'HTTPResponse' object has no attribute 'read1'. This is expected becuase TextIOWrapper is intended to be used with BufferedIOBase objects, not IOBase objects. And it only happens on python2's implementation of TextIOWrapper because it always expects underlying object to have read1 (source), while python3's implementation checks for read1 existence and falls back to read gracefully (source).
>>> f = get_file()
>>> tw = io.TextIOWrapper(f)
>>> list(csv.reader(tw))
AttributeError: 'HTTPResponse' object has no attribute 'read1'
Then I tried to wrap HTTPResponse with io.BufferedReader and then with io.TextIOWrapper. And I got the following error:
>>> f = get_file()
>>> br = io.BufferedReader(f)
>>> tw = io.TextIOWrapper(br)
>>> list(csv.reader(f))
ValueError: I/O operation on closed file.
After some investigation it turns out that the error only happens when the file doesn't end with \n. If it does end with \n then the problem does not happen and everything works fine.
There is some additional logic for closing underlying object in HTTPResponse (source) which is seemingly causing the problem.
The question is: how can I write my code to
work on both python2 and python3, preferably with no try..catch or version-dependent branching;
properly handle CSV files represented as HTTPResponse regardless of whether they end with \n or not?
One possible solution would be to make a custom wrapper around TextIOWrapper which would make read() return b'' when the object is closed instead of raising ValueError. But is there any better solution, without such hacks?

Looks like this is an interface mismatch between urllib3.HTTPResponse and file objects. It is described in this urllib3 issue #1305.
For now there is no fix, hence I used the following wrapper code which seemingly works fine:
class ResponseWrapper(io.IOBase):
"""
This is the wrapper around urllib3.HTTPResponse
to work-around an issue shazow/urllib3#1305.
Here we decouple HTTPResponse's "closed" status from ours.
"""
# FIXME drop this wrapper after shazow/urllib3#1305 is fixed
def __init__(self, resp):
self._resp = resp
def close(self):
self._resp.close()
super(ResponseWrapper, self).close()
def readable(self):
return True
def read(self, amt=None):
if self._resp.closed:
return b''
return self._resp.read(amt)
def readinto(self, b):
val = self.read(len(b))
if not val:
return 0
b[:len(val)] = val
return len(val)
And use it as follows:
>>> f = get_file()
>>> r = csv.reader(ResponseWrapper(io.TextIOWrapper(io.BufferedReader(f))))
>>> list(r)
The similar fix was proposed by urllib3 maintainers in the bug report comments but it would be a breaking change hence for now things will probably not change, so I have to use wrapper (or do some monkey patching which is probably worse).

How do I get python unittest to test that a function returns a csv.reader object?

I'm using python 2.7 and delving into TDD. I'm trying to test a simple function that uses the csv module and returns a csv.reader object. I want to test that the correct type of object is being returned with the assertIsInstance test however I'm having trouble figuring out how to make this work.
#!/usr/bin/python
import os, csv
def importCSV(fileName):
'''importCSV brings in the CSV transaction file to be analyzed'''
try:
if not(os.path.exists("data")):
os.makedirs("data")
except(IOError):
return "Couldn't create data directory!"
try:
fileFullName = os.path.join("data", fileName)
return csv.reader(file(fileFullName))
except(IOError):
return "File not found!"
The test currently looks like this....
#!/usr/bin/python
from finaImport import finaImport
import unittest, os, csv
class testImport(unittest.TestCase):
'''Tests for importing a CSV file'''
def testImportCSV(self):
''' Test a good file and make sure importCSV returns a csv reader object '''
readerObject = finaImport.importCSV("toe")
self.assertTrue(str(type(readerObject))), "_csv.reader")
I really don't think wrapping "toe" in a str and type function is correct. When I try something like...
self.assertIsInstance(finaImport.importCSV("toe"), csv.reader)
It returns an error like...
TypeError: isinstance() arg2 must be a class, type, or tuple of classes and types
Help???

self.assertTrue(str(type(readerObject)), "_csv.reader")
I don't think that your first test (above) is so bad (I fixed a small typo there; you had an extra closing parenthesis). It checks that the type name is exactly "_csv.reader". On the other hand, the underscore in "_csv" tells you that this object is internal to the csv module. In general, you shouldn't be concerned about that.
Your attempt at the assertIsInstance test is flawed in that csv.reader is a function object. If you try it in the REPL, you see:
>>> import csv
>>> csv.reader
<built-in function reader>
Often, we care less about the type of an object and more about whether it implements a certain interface. In this case, the help for csv.reader says:
>>> help(csv.reader)
... The returned object is an iterator. ...
So, you could do the following test (instead or in addition to your other one):
self.assertIsInstance(readerObject, collections.Iterator)
You'll need a import collections for that, of course. And, you might want to test that the iterator returns lists of strings, or something like this. That would allow you to use something else under the hood later and the test would still pass.

Mock_open CSV file not getting any data

I am trying to unit test a piece of code:
def _parse_results(self, file_name):
results_file = open(file_name)
results_data = list(csv.reader(results_file))
index = len(results_data[1])-1
results_file.close()
return float(results_data[1][index])
by using mock_open like so:
#mock.patch('path.open', mock.mock_open(read_data='test, test2, test3, test4'))
def test_parse_results(self):
cut = my_class(emulate=True)
self.assertEqual(VAL, cut._parse_results('file'))
The problem I am running into is that I do not get any data when running csv.reader. If I run results_file.readlines() I get 'test, test2, test3, test4' which means that mock_open is working properly. But when I run csv.reader(results_file) I lose all the data.

This is because mock_open doesn't implement every feature that a file has, and notably not some of the ones that csv needs.
mock_open implements the methods read(), readline() and readlines(), and works both as a function and when called as a context manager (https://docs.python.org/3/library/unittest.mock.html#mock-open), whereas csv.reader works with…
any object which supports the iterator protocol and returns a string each time its __next__() method is called — file objects and list objects are both suitable
— https://docs.python.org/3/library/csv.html#csv.reader
Note that mock_open doesn't implement the __next__() method, and doesn't raise StopIteration when the end is reached, so it won't work with csv.reader.
The solution, as #Emily points out in her answer, is to turn the file into a list of its lines. This is possible because mock_open implements readlines(), and the resulting list is suitable for reading into csv.reader as the documentation says.

This really got me too, and was a nightmare to pinpoint. To use your example code, this works
results_data = list(csv.reader(results_file.read()))
and this works
results_data = list(csv.reader(results_file.readlines()))
but this doesn't work
results_data = list(csv.reader(results_file))
using Python 3.4.
It seems counter to the documented interface of csv.reader so maybe an expert can elaborate on why.

How do you check if an object is an instance of 'file'?

It used to be in Python (2.6) that one could ask:
isinstance(f, file)
but in Python 3.0 file was removed.
What is the proper method for checking to see if a variable is a file now? The What'sNew docs don't mention this...

def read_a_file(f)
try:
contents = f.read()
except AttributeError:
# f is not a file
substitute whatever methods you plan to use for read. This is optimal if you expect that you will get passed a file like object more than 98% of the time. If you expect that you will be passed a non file like object more often than 2% of the time, then the correct thing to do is:
def read_a_file(f):
if hasattr(f, 'read'):
contents = f.read()
else:
# f is not a file
This is exactly what you would do if you did have access to a file class to test against. (and FWIW, I too have file on 2.6) Note that this code works in 3.x as well.

In python3 you could refer to io instead of file and write
import io
isinstance(f, io.IOBase)

Typically, you don't need to check an object type, you could use duck-typing instead i.e., just call f.read() directly and allow the possible exceptions to propagate -- it is either a bug in your code or a bug in the caller code e.g., json.load() raises AttributeError if you give it an object that has no read attribute.
If you need to distinguish between several acceptable input types; you could use hasattr/getattr:
def read(file_or_filename):
readfile = getattr(file_or_filename, 'read', None)
if readfile is not None: # got file
return readfile()
with open(file_or_filename) as file: # got filename
return file.read()
If you want to support a case when file_of_filename may have read attribute that is set to None then you could use try/except over file_or_filename.read -- note: no parens, the call is not made -- e.g., ElementTree._get_writer().
If you want to check certain guarantees e.g., that only one single system call is made (io.RawIOBase.read(n) for n > 0) or there are no short writes (io.BufferedIOBase.write()) or whether read/write methods accept text data (io.TextIOBase) then you could use isinstance() function with ABCs defined in io module e.g., look at how saxutils._gettextwriter() is implemented.

Works for me on python 2.6... Are you in a strange environment where builtins aren't imported by default, or where somebody has done del file, or something?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Assert that an object can be serialized using joblib - python

Related

Mocking a file containing JSON data with mock.patch and mock_open

Wrapping urllib3.HTTPResponse in io.TextIOWrapper

How do I get python unittest to test that a function returns a csv.reader object?

Mock_open CSV file not getting any data

How do you check if an object is an instance of 'file'?

Categories

Resources