Create a gzip file like object for unit testing

Create a gzip file like object for unit testing - python

I want to test a Python function that reads a gzip file and extracts something from the file (using pytest).
import gzip
def my_function(file_path):
output = []
with gzip.open(file_path, 'rt') as f:
for line in f:
output.append('something from line')
return output
Can I create a gzip file like object that I can pass to my_function? The object should have defined content and should work with gzip.open()
I know that I can create a temporary gzip file in a fixture but this depends on the filesystem and other properties of the environment. Creating a file-like object from code would be more portable.

You can use the io and gzip libraries to create in-memory file objects. Example:
import io, gzip
def inmem():
stream = io.BytesIO()
with gzip.open(stream, 'wb') as f:
f.write(b'spam\neggs\n')
stream.seek(0)
return stream

You should never try to test outside code in a unit test. Only test the code you wrote. If you're testing gzip, then gzip is doing something wrong (they should be writing their own unit tests). Instead, do something like this:
from unittest import mock
#mock.Mock('gzip', return_value=b'<whatever you expect to be returned from gzip>')
def test_my_function(mock_gzip):
file_path = 'testpath'
output = my_function(file_path=file_path)
mock_gzip.open.assert_called_with(file_path)
assert output == b'<whatever you expect to be returned from your method>'
That's your whole unit test. All you want to know is that gzip.open() was called (and you assume it works or else gzip is failing and that's their problem) and that you got back what you expected from the method being tested. You specify what gzip returns based on what you expect it to return, but you don't actually call the function in your test.

It's a bit verbose but I'd do something like this (I have assumed that you saved my_function to a file called patch_one.py):
import patch_one # this is the file with my_function in it
from unittest.mock import patch
from unittest import TestCase
class MyTestCase(TestCase):
def test_my_function(self):
# because you used "with open(...) as f", we need a mock context
class MyContext:
def __enter__(self, *args, **kwargs):
return [1, 2] # note the two items
def __exit__(self, *args, **kwargs):
return None
# in case we want to know the arguments to open()
open_args = None
def f(*args, **kwargs):
def my_open(*args, **kwargs):
nonlocal open_args
open_args = args
return MyContext()
return my_open
# patch the gzip.open in our file under test
with patch('patch_one.gzip.open', new_callable=f):
# finally, we can call the function we want to test
ret_val = patch_one.my_function('not a real file path')
# note the two items, corresponding to the list in __enter__()
self.assertListEqual(['something from line', 'something from line'], ret_val)
# check the arguments, just for fun
self.assertEqual('rt', open_args[1])
If you want to try anything more complicated, I would recommend reading the unittest mock docs because how you import the "patch_one" file matters as does the string you pass to patch().
There will definitely be a way to do this with Mock or MagicMock but I find them a bit hard to debug so I went the long way round.

Related

Unittest simulate reading a yaml file with a mock

I try test a function which reads a file and returns the content of the file or returns none if the file is not found.
def read_yaml_from_cwd(file: str) -> Dict:
"""[reads a yaml file from current working directory]
Args:
file ([type]): [.yaml or .yml file]
Returns:
[type]: [Dictionary]
"""
path = os.path.join(Path.cwd().resolve(), file)
if os.path.isfile(path):
with open(path) as f:
content = yaml.load(f, Loader=SafeLoader)
return content
else:
return None
This is my test:
from unittest import mock, TestCase
from project import fs
class TextExamples(TestCase):
def test_read_yaml_from_cwd():
with mock.patch('os.listdir') as mocked_listdir:
mocked_listdir.return_value = ['test-config.yml']
val = fs.read_yaml_from_cwd("false-config.yml")
assert val == None
val2 = fs.read_yaml_from_cwd("false-config.yml")
assert val2 != None
I guess I am fundamentally doing something wrong with these tests and what these mocks do. Can somebody help me with this?

One possibility to test this is to patch both os.path.isfile and open. To patch open, there is already a special mock function, mock_open, which gives you the possibilty to set the contents of the mocked file. This means that you don't have to mock yaml.load, as this will return the mocked file content. This could look something like:
from unittest import mock, TestCase
from unittest.mock import mock_open
class YamlTest(TestCase):
#mock.patch("builtins.open", mock_open(read_data="data"))
#mock.patch("os.path.isfile")
def test_read_yaml_from_cwd(self, patched_isfile):
# valid file case
patched_isfile.return_value = True
result = read_yaml_from_cwd("some_file.yaml")
self.assertEqual("data", result)
# invalid file case
patched_isfile.return_value = False
result = read_yaml_from_cwd("some_file.yaml")
self.assertEqual(None, result)
In this case you test that the function returns the file content if you pass a valid file name, and None for an invalid file name, which is probably all you want to test here.
For completeness, and because I mentioned it in the comments: using pyfakefs instead would replace the file system with a fake file system, which you can handle like a real filesystem, in this case it could look like:
from pyfakefs import fake_filesystem_unittest
class YamlTest(fake_filesystem_unittest.TestCase):
def setUp(self) -> None:
self.setUpPyfakefs()
self.fs.create_file("some_file.yaml", contents="data")
def test_read_yaml_from_cwd(self):
# valid file case
result = read_yaml_from_cwd("some_file.yaml")
self.assertEqual("data", result)
# invalid file case
result = read_yaml_from_cwd("non_existing.yaml")
self.assertEqual(None, result)
This makes sense if you have many filesystem related tests, though in your case this would probably be overkill.
Disclaimer: I'm a contributor to pyfakefs.

Python Unit test for method contains request

I have a method, which contains external rest-api calls.
ex:
def get_dataset():
url=requests.get("http://api:5001/get_trainingdata")
filename=url.text[0]
return filename
When I do #patch for this function, I can able to do unittest. But, coverage in not covering whole function.
How can write unittest case for this method with full coverage?
My testcase
#mock.patch('api.get_dataset')
def test_using_decorator1(self, mocked_get_dataset):
file = [{"file":"ddddd"}]
mocked_get_dataset.return_value = Mock()
mocked_get_dataset.return_value.json.return_value = file
filename = file[0]
self.assertEqual(filename, file[0])

Python Callback for File Object Close

I am working on a custom file path class, which should always execute a function
after the corresponding system file has been written to and its file object
closed. The function will upload the contents of file path to a remote location.
I want the upload functionality to happen entirely behind the scenes from a user
perspective, i.e. the user can use the class just like any other os.PathLike
class and automatically get the upload functionality. Psuedo code below for
refernce.
import os
class CustomPath(os.PathLike):
def __init__(self, remote_path: str):
self._local_path = "/some/local/path"
self._remote_path = remote_path
def __fspath__(self) -> str:
return self._local_path
def upload(self):
# Upload local path to remote path.
I can of course handle automatically calling the upload function for when the
user calls any of the methods directly.
However, it unclear to me how to automatically call the upload function if
someone writes to the file with the builtin open as follows.
custom_path = CustomPath("some remote location")
with open(custom_path, "w") as handle:
handle.write("Here is some text.")
or
custom_path = CustomPath("some remote location")
handle = open(custom_path, "w")
handle.write("Here is some text.")
handle.close()
I desire compatibility with invocations of the open function, so that the
upload behavior will work with all third party file writers. Is this kind of
behavior possible in Python?

Yes, it is possible with Python by making use of Python's function overriding, custom context manager and __ getattr __ facilities. Here's the basic logic:
override the builtins.open() function with custom open() class.
make it compatible with context manager using __ enter __ and __ exit__ methods.
make it compatible with normal read/write operations using __ getattr __ method.
call builtins method from the class whenever necessary.
invoke automatically callback function when close() method is called.
Here's the sample code:
import builtins
import os
to_be_monitered = ['from_file1.txt', 'from_file2.txt']
# callback function (called when file closes)
def upload(content_file):
# check for required file
if content_file in to_be_monitered:
# copy the contents
with builtins.open(content_file, 'r') as ff:
with builtins.open(remote_file, 'a') as tf:
# some logic for writing only new contents can be used here
tf.write('\n'+ff.read())
class open(object):
def __init__(self, path, mode):
self.path = path
self.mode = mode
# called when context manager invokes
def __enter__(self):
self.file = builtins.open(self.path, self.mode)
return self.file
# called when context manager returns
def __exit__(self, *args):
self.file.close()
# after closing calling upload()
upload(self.path)
return True
# called when normal non context manager invokes the object
def __getattr__(self, item):
self.file = builtins.open(self.path, self.mode)
# if close call upload()
if item == 'close':
upload(self.path)
return getattr(self.file, item)
if __name__ == '__main__':
remote_file = 'to_file.txt'
local_file1 = 'from_file1.txt'
local_file2 = 'from_file2.txt'
# just checks and creates remote file no related to actual problem
if not os.path.isfile(remote_file):
f = builtins.open(remote_file, 'w')
f.close()
# DRIVER CODE
# writing with context manger
with open(local_file1, 'w') as f:
f.write('some text written with context manager to file1')
# writing without context manger
f = open(local_file2, 'w')
f.write('some text written without using context manager to file2')
f.close()
# reading file
with open(remote_file, 'r') as f:
print('remote file contains:\n', f.read())
What does it do:
Writes "some text written with context manager to file1" to local_file1.txt and "some text written without context manager to file2" to local_file2.txt meanwhile copies these text to remote_file.txt automatically without copying explicitly.
How does it do:(context manager case)
with open(local_file1, 'w') as f: cretes an object of custom class open and initializes it's path and mode variables. And calls __ enter __ function(because of context manager(with as block)) which opens the file using builtins.open() method and returns the _io.TextIOWrapper (a opened text file object) object. It is a normal file object we can use it normally for read/write operations. After that context manger calls __ exit __ function at the end which(__ exit__) closess the file and calls required callback(here upload) function automatically and passes the file path just closed. In this callback function we can perform any operations like copying.
Non-context manger case also works similarly but the difference is __ getattr __ function is the one making magic.
Here's the contents of file's after the execution of code:
from_file1.txt
some text written with context manager to file1
from_file2.txt
some text written without using context manager to file2
to_file.txt
some text written with context manager to file1
some text written without using context manager to file2

Based on your comment to Girish Dattatray Hegde, it seems that what you would like to do is something like the following to override the default __exit__ handler for open:
import io
old_exit = io.FileIO.__exit__ # builtin __exit__ method
def upload(self):
print(self.read()) # just print out contents
def new_exit(self):
try:
upload(self)
finally:
old_exit(self) # invoke the builtin __exit__ method
io.FileIO.__exit__ = new_exit # establish our __exit__ method
with open('test.html') as f:
print(f.closed) # False
print(f.closed) # True
Unfortunately, the above code results in the following error:
test.py", line 18, in <module>
io.FileIO.__exit__ = new_exit # establish our __exit__ method
TypeError: can't set attributes of built-in/extension type '_io.FileIO'
So, I don't believe it is possible to do what you want to do. Ultimately you can create your own subclasses and override methods, but you cannot replace methods of the exiting builtin open class.

Run pytest for each file in directory

I'm trying to build a routine that calls a Pytest class for each PDF document in current directoy... Let me explain
Lets say i have this test file
import pytest
class TestHeader:
#asserts...
class TestBody:
#asserts...
This script needs to test each pdf document in my cwd
Here is my best attempt:
import glob
import pytest
class TestHeader:
#asserts...
class TestBody:
#asserts...
filelist = glob.glob('*.pdf')
for file in filelist:
#magically call pytest for each file
How would i approach this?
EDIT: Complementing my question.
I have a huge function that extracts each document's data, lets call it extract_pdf
this function returns a tuple (header, body).
Current attempt looks like this:
import glob
import pytest
class TestHeader:
#asserts...
class TestBody:
#asserts...
filelist = glob.glob('*.pdf')
for file in filelist:
header, body = extract_pdf(file)
pytest.main(<pass header and body as args for pytest>)
I need to parse each document prior to testing. Can it be done this way?

The best way to do this through parameterization of the testcases dynamically..
This can be achieved using the pytest_generate_tests hook..
def pytest_generate_tests(metafunc):
filelist = glob.glob('*.pdf')
metafunc.parametrize("fileName", filelist )
NOTE: fileName should be one of the argument to your test function.
This will result in executing the testcase for each of the file in the directory and the testcase will be like
TestFunc[File1]
TestFunc[File2]
TestFunc[File3]
.
.
and so on..

This is expanding on the existing answer by #ArunKalirajaBaskaran.
The problem is that you have different test classes that want to use the same data, but you want to parse the data only once. If it is ok for you to read all data at once, you could read them into global variables and use these for parametrizing your tests:
def extract_data():
filenames = []
headers = []
bodies = []
for filename in glob.glob('*.pdf'):
header, body = extract_pdf(filename)
filenames.append(filename)
headers.append(header)
bodies.append(body)
return filenames, headers, bodies
filenames, headers, bodies = extract_data()
def pytest_generate_tests(metafunc):
if "header" in metafunc.fixturenames:
# use the filename as ID for better test names
metafunc.parametrize("header", headers, ids=filenames)
elif "body" in metafunc.fixturenames:
metafunc.parametrize("body", bodies, ids=filenames)
class TestHeader:
def test_1(header):
...
def test_2(header):
...
class TestBody:
def test_1(body):
...
This is the same as using
class TestHeader:
#pytest.mark.parametrize("header", headers, ids=filenames)
def test_1(header):
...
#pytest.mark.parametrize("header", headers, ids=filenames)
def test_2(header):
...
pytest_generate_tests just adds a bit of convenience so you don't have to repeat the parametrize decorator for each test.
The downside of this is of course that you will read in all of the data at once, which may cause a problem with memory usage if there is a lot of files. Your approach with pytest.main will not work, because that is the same as calling pytest on the command line with the given parameters. Parametrization can be done at the fixture level or on the test level (like here), but both need the parameters alreay evaluated at load time, so I don't see a possibility to do this lazily (apart from putting it all into one test). Maybe someone else has a better idea...

Module seemingly reimported in memoized Python function

I'm writing a Python command line utility that involves converting a string into a TextBlob, which is part of a natural language processing module. Importing the module is very slow, ~300 ms on my system. For speediness, I created a memoized function that converts text to a TextBlob only the first time the function is called. Importantly, if I run my script over the same text twice, I want to avoid reimporting TextBlob and recomputing the blob, instead pulling it from the cache. That's all done and works fine, except, for some reason, the function is still very slow. In fact, it's as slow as it was before. I think this must be because the module is getting reimported even though the function is memoized and the import statement happens inside the memoized function.
The goal here is to fix the following code so that the memoized runs are as speedy as they ought to be, given that the result does not need to be recomputed.
Here's a minimal example of the core code:
#memoize
def make_blob(text):
import textblob
return textblob.TextBlob(text)
if __name__ == '__main__':
make_blob("hello")
And here's the memoization decorator:
import os
import shelve
import functools
import inspect
def memoize(f):
"""Cache results of computations on disk in a directory called 'cache'."""
path_of_this_file = os.path.dirname(os.path.realpath(__file__))
cache_dirname = os.path.join(path_of_this_file, "cache")
if not os.path.isdir(cache_dirname):
os.mkdir(cache_dirname)
cache_filename = f.__module__ + "." + f.__name__
cachepath = os.path.join(cache_dirname, cache_filename)
try:
cache = shelve.open(cachepath, protocol=2)
except:
print 'Could not open cache file %s, maybe name collision' % cachepath
cache = None
#functools.wraps(f)
def wrapped(*args, **kwargs):
argdict = {}
# handle instance methods
if hasattr(f, '__self__'):
args = args[1:]
tempargdict = inspect.getcallargs(f, *args, **kwargs)
for k, v in tempargdict.iteritems():
argdict[k] = v
key = str(hash(frozenset(argdict.items())))
try:
return cache[key]
except KeyError:
value = f(*args, **kwargs)
cache[key] = value
cache.sync()
return value
except TypeError:
call_to = f.__module__ + '.' + f.__name__
print ['Warning: could not disk cache call to ',
'%s; it probably has unhashable args'] % (call_to)
return f(*args, **kwargs)
return wrapped
And here's a demonstration that the memoization doesn't currently save any time:
❯ time python test.py
python test.py 0.33s user 0.11s system 100% cpu 0.437 total
~/Desktop
❯ time python test.py
python test.py 0.33s user 0.11s system 100% cpu 0.436 total
This is happening even though the function is correctly being memoized (print statements put inside the memoized function only give output the first time the script is run).
I've put everything together into a GitHub Gist in case it's helpful.

What about a different approach:
import pickle
CACHE_FILE = 'cache.pkl'
try:
with open(CACHE_FILE) as pkl:
obj = pickle.load(pkl)
except:
import slowmodule
obj = "something"
with open(CACHE_FILE, 'w') as pkl:
pickle.dump(obj, pkl)
print obj
Here we cache the object, not the module. Note that this will not give you any savings if the object your caching requires slowmodule. So in the above example you would see savings, since "something" is a string and doesn't require the slowmodule module to understand it. But if you did something like
obj = slowmodule.Foo("bar")
The unpickling process would automatically import slowmodule, negating any benefit of caching.
So if you can manipulate textblob.TextBlob(text) into something that, when unpickled doesn't require the textblob module, then you'll see savings using this approach.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Create a gzip file like object for unit testing - python

You can use the io and gzip libraries to create in-memory file objects. Example: import io, gzip def inmem(): stream = io.BytesIO() with gzip.open(stream, 'wb') as f: f.write(b'spam\neggs\n') stream.seek(0) return stream

Related

Unittest simulate reading a yaml file with a mock

Python Unit test for method contains request

Python Callback for File Object Close

Run pytest for each file in directory

Module seemingly reimported in memoized Python function

Categories

Resources