I am working on a custom file path class, which should always execute a function
after the corresponding system file has been written to and its file object
closed. The function will upload the contents of file path to a remote location.
I want the upload functionality to happen entirely behind the scenes from a user
perspective, i.e. the user can use the class just like any other os.PathLike
class and automatically get the upload functionality. Psuedo code below for
refernce.
import os
class CustomPath(os.PathLike):
def __init__(self, remote_path: str):
self._local_path = "/some/local/path"
self._remote_path = remote_path
def __fspath__(self) -> str:
return self._local_path
def upload(self):
# Upload local path to remote path.
I can of course handle automatically calling the upload function for when the
user calls any of the methods directly.
However, it unclear to me how to automatically call the upload function if
someone writes to the file with the builtin open as follows.
custom_path = CustomPath("some remote location")
with open(custom_path, "w") as handle:
handle.write("Here is some text.")
or
custom_path = CustomPath("some remote location")
handle = open(custom_path, "w")
handle.write("Here is some text.")
handle.close()
I desire compatibility with invocations of the open function, so that the
upload behavior will work with all third party file writers. Is this kind of
behavior possible in Python?
Yes, it is possible with Python by making use of Python's function overriding, custom context manager and __ getattr __ facilities. Here's the basic logic:
override the builtins.open() function with custom open() class.
make it compatible with context manager using __ enter __ and __ exit__ methods.
make it compatible with normal read/write operations using __ getattr __ method.
call builtins method from the class whenever necessary.
invoke automatically callback function when close() method is called.
Here's the sample code:
import builtins
import os
to_be_monitered = ['from_file1.txt', 'from_file2.txt']
# callback function (called when file closes)
def upload(content_file):
# check for required file
if content_file in to_be_monitered:
# copy the contents
with builtins.open(content_file, 'r') as ff:
with builtins.open(remote_file, 'a') as tf:
# some logic for writing only new contents can be used here
tf.write('\n'+ff.read())
class open(object):
def __init__(self, path, mode):
self.path = path
self.mode = mode
# called when context manager invokes
def __enter__(self):
self.file = builtins.open(self.path, self.mode)
return self.file
# called when context manager returns
def __exit__(self, *args):
self.file.close()
# after closing calling upload()
upload(self.path)
return True
# called when normal non context manager invokes the object
def __getattr__(self, item):
self.file = builtins.open(self.path, self.mode)
# if close call upload()
if item == 'close':
upload(self.path)
return getattr(self.file, item)
if __name__ == '__main__':
remote_file = 'to_file.txt'
local_file1 = 'from_file1.txt'
local_file2 = 'from_file2.txt'
# just checks and creates remote file no related to actual problem
if not os.path.isfile(remote_file):
f = builtins.open(remote_file, 'w')
f.close()
# DRIVER CODE
# writing with context manger
with open(local_file1, 'w') as f:
f.write('some text written with context manager to file1')
# writing without context manger
f = open(local_file2, 'w')
f.write('some text written without using context manager to file2')
f.close()
# reading file
with open(remote_file, 'r') as f:
print('remote file contains:\n', f.read())
What does it do:
Writes "some text written with context manager to file1" to local_file1.txt and "some text written without context manager to file2" to local_file2.txt meanwhile copies these text to remote_file.txt automatically without copying explicitly.
How does it do:(context manager case)
with open(local_file1, 'w') as f: cretes an object of custom class open and initializes it's path and mode variables. And calls __ enter __ function(because of context manager(with as block)) which opens the file using builtins.open() method and returns the _io.TextIOWrapper (a opened text file object) object. It is a normal file object we can use it normally for read/write operations. After that context manger calls __ exit __ function at the end which(__ exit__) closess the file and calls required callback(here upload) function automatically and passes the file path just closed. In this callback function we can perform any operations like copying.
Non-context manger case also works similarly but the difference is __ getattr __ function is the one making magic.
Here's the contents of file's after the execution of code:
from_file1.txt
some text written with context manager to file1
from_file2.txt
some text written without using context manager to file2
to_file.txt
some text written with context manager to file1
some text written without using context manager to file2
Based on your comment to Girish Dattatray Hegde, it seems that what you would like to do is something like the following to override the default __exit__ handler for open:
import io
old_exit = io.FileIO.__exit__ # builtin __exit__ method
def upload(self):
print(self.read()) # just print out contents
def new_exit(self):
try:
upload(self)
finally:
old_exit(self) # invoke the builtin __exit__ method
io.FileIO.__exit__ = new_exit # establish our __exit__ method
with open('test.html') as f:
print(f.closed) # False
print(f.closed) # True
Unfortunately, the above code results in the following error:
test.py", line 18, in <module>
io.FileIO.__exit__ = new_exit # establish our __exit__ method
TypeError: can't set attributes of built-in/extension type '_io.FileIO'
So, I don't believe it is possible to do what you want to do. Ultimately you can create your own subclasses and override methods, but you cannot replace methods of the exiting builtin open class.
Related
I want to test a Python function that reads a gzip file and extracts something from the file (using pytest).
import gzip
def my_function(file_path):
output = []
with gzip.open(file_path, 'rt') as f:
for line in f:
output.append('something from line')
return output
Can I create a gzip file like object that I can pass to my_function? The object should have defined content and should work with gzip.open()
I know that I can create a temporary gzip file in a fixture but this depends on the filesystem and other properties of the environment. Creating a file-like object from code would be more portable.
You can use the io and gzip libraries to create in-memory file objects. Example:
import io, gzip
def inmem():
stream = io.BytesIO()
with gzip.open(stream, 'wb') as f:
f.write(b'spam\neggs\n')
stream.seek(0)
return stream
You should never try to test outside code in a unit test. Only test the code you wrote. If you're testing gzip, then gzip is doing something wrong (they should be writing their own unit tests). Instead, do something like this:
from unittest import mock
#mock.Mock('gzip', return_value=b'<whatever you expect to be returned from gzip>')
def test_my_function(mock_gzip):
file_path = 'testpath'
output = my_function(file_path=file_path)
mock_gzip.open.assert_called_with(file_path)
assert output == b'<whatever you expect to be returned from your method>'
That's your whole unit test. All you want to know is that gzip.open() was called (and you assume it works or else gzip is failing and that's their problem) and that you got back what you expected from the method being tested. You specify what gzip returns based on what you expect it to return, but you don't actually call the function in your test.
It's a bit verbose but I'd do something like this (I have assumed that you saved my_function to a file called patch_one.py):
import patch_one # this is the file with my_function in it
from unittest.mock import patch
from unittest import TestCase
class MyTestCase(TestCase):
def test_my_function(self):
# because you used "with open(...) as f", we need a mock context
class MyContext:
def __enter__(self, *args, **kwargs):
return [1, 2] # note the two items
def __exit__(self, *args, **kwargs):
return None
# in case we want to know the arguments to open()
open_args = None
def f(*args, **kwargs):
def my_open(*args, **kwargs):
nonlocal open_args
open_args = args
return MyContext()
return my_open
# patch the gzip.open in our file under test
with patch('patch_one.gzip.open', new_callable=f):
# finally, we can call the function we want to test
ret_val = patch_one.my_function('not a real file path')
# note the two items, corresponding to the list in __enter__()
self.assertListEqual(['something from line', 'something from line'], ret_val)
# check the arguments, just for fun
self.assertEqual('rt', open_args[1])
If you want to try anything more complicated, I would recommend reading the unittest mock docs because how you import the "patch_one" file matters as does the string you pass to patch().
There will definitely be a way to do this with Mock or MagicMock but I find them a bit hard to debug so I went the long way round.
I have a function that parses a given string with specific rules. I would like to design a CLI interface for this function. But the problem is I want that a user should be able to call this function via CLI using a READER & WRITER function of its own. To make it clear, here is a sample code and a demonstration of what I'm trying to explain.
# mylib.py
# piece of code that belongs to my lib
def parser(_id, text):
# parse the text & do some magic
return (_id, parsed_text)
# user-side code
def reader():
# read from a database
# or file or network or who knows where
yield (_id, text)
# user-side code
def writer(_id, text):
# write to somewhere
return True # or false depends on write action
A sample call should be something like this:
$ python mylib.py --reader <something-that-I-dont-know>
I don't want to use eval tricks but also I want that the user should be flexible while passing data to my library. Does this possible? Or should I try another approach?
With the help of #AlexHall, I've come up with the following solution:
import pathlib
import importlib.util
def load_module(filepath):
module_path = pathlib.Path(filepath)
abs_path = module_path.resolve()
module_name = module_path.stem
spec = importlib.util.spec_from_file_location(module_name, abs_path)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
return module
Using this function, I am be able to import any valid python module exists in the filesystem even if the module is not in the path.
Here is a sample usage:
parser = make_parser(prog="tokenizer")
args = parser.parse_args()
module = load_module(args.writer) # if nothing is passed, default action defined in the parser
writer = module.writer
module = load_module(args.reader)
reader = module.reader
# do what you want to do with them
I ran into a problem. Through the below code I am trying to simplify several file/json objects in a large script.
Pointer.py
import json
class Pointers:
def __init__(self, target_file, mode, data):
self.target_file = target_file # file nameand path to load/store
self.data = data # data to load/store
self.mode = mode # mode on the data
# some other functions
# Writer object for non-json files
def sys_writer_4file(self):
with open(self.target_file, self.mode) as write_pointer:
handler = write_pointer.write(self.data)
write_pointer.close()
return handler
But when I try calling it from another script like below,
Report.py
from f_pointers import Pointer
class Something:
def someElse(self, url):
self.url = url
def someNonStaticFunction(self):
path = "./filepath/filename"
someData = data
Pointers.sys_writer_4file(("./filepath/filename", 'wb', somedata)
I get the unexpected argument warning from my interpreter saying,
This inspection reports discrepancies between declared parameters and
actual arguments, as well as incorrect arguments (e.g. duplicate named
arguments) and incorrect argument order. Decorators are analyzed, too.
On this line:
Pointers.sys_writer_4file(("./filepath/filename", 'wb', somedata)
Can someone advise me how I should create the object?
Thanks in advance.
You have to init your class first:
Pointers("./filepath/filename", 'wb', somedata).sys_writer_4file()
I am using Pytest to test an executable. This .exe file reads a configuration file on startup.
I have written a fixture to spawn this .exe file at the start of each test and closes it down at the end of the test. However, I cannot work out how to tell the fixture which configuration file to use. I want the fixture to copy a specified config file to a directory before spawning the .exe file.
#pytest.fixture
def session(request):
copy_config_file(specific_file) # how do I specify the file to use?
link = spawn_exe()
def fin():
close_down_exe()
return link
# needs to use config file foo.xml
def test_1(session):
session.talk_to_exe()
# needs to use config file bar.xml
def test_2(session):
session.talk_to_exe()
How do I tell the fixture to use foo.xml for test_1 function and bar.xml for test_2 function?
Thanks
John
One solution is to use pytest.mark for that:
import pytest
#pytest.fixture
def session(request):
m = request.node.get_closest_marker('session_config')
if m is None:
pytest.fail('please use "session_config" marker')
specific_file = m.args[0]
copy_config_file(specific_file)
link = spawn_exe()
yield link
close_down_exe(link)
#pytest.mark.session_config("foo.xml")
def test_1(session):
session.talk_to_exe()
#pytest.mark.session_config("bar.xml")
def test_2(session):
session.talk_to_exe()
Another approach would be to just change your session fixture slightly to delegate the creation of the link to the test function:
import pytest
#pytest.fixture
def session_factory(request):
links = []
def make_link(specific_file):
copy_config_file(specific_file)
link = spawn_exe()
links.append(link)
return link
yield make_link
for link in links:
close_down_exe(link)
def test_1(session_factory):
session = session_factory('foo.xml')
session.talk_to_exe()
def test_2(session):
session = session_factory('bar.xml')
session.talk_to_exe()
I prefer the latter as its simpler to understand and allows for more improvements later, for example, if you need to use #parametrize in a test based on the config value. Also notice the latter allows to spawn more than one executable in the same test.
app = Flask(__name__)
#app.route("/")
def hello():
address="someserver"
global FTP
ftp = FTP(address)
ftp.login()
return ftp.retrlines("LIST")
if __name__ == "__main__":
app.run()
...this gives me a following output:
226-Options: -l 226 1 matches total
The question is - why does not this print the output of retrlines and how do I do so?
The documentation for the ftplib.FTP class says that retrlines takes an optional callback - if no callback is provided "The default callback prints the line to sys.stdout." This means that the method retrlines does not actually return the data provided - it simply passes each line as it receives it to a callable that may be passed to it. This leaves you with a couple of options:
Pass in a callable that can stores the results of being called multiple times:
def fetchlines(line=None):
if line is not None:
# As long as we are called with a line
# store the line in the array we added to this function
fetchlines.lines.append(line)
else:
# When we are called without a line
# we are retrieving the lines
# Truncate the array after copying it
# so we can re-use this function
lines = fetchlines.lines[:]
fetchlines.lines = []
return lines
fetchlines.lines = []
#app.route("/")
def hello():
ftp = FTP("someaddress")
ftp.login()
ftp.dir(fetchlines)
lines = fetchlines()
return "<br>".join(lines)
Replace sys.stdout with a file-like object (from cStringIO for example) and then simply read the file afterwards:
from cStringIO import StringIO
from sys import stdout
# Save a reference to stdout
STANDARD_OUT = stdout
#app.route("/")
def hello():
ftp = FTP("someaddress")
ftp.login()
# Change stdout to point to a file-like object rather than a terminal
file_like = StringIO()
stdout = file_like
ftp.dir()
# lines in this case will be a string, not a list
lines = file_like.getvalue()
stdout = STANDARD_OUT
file_like.close()
return lines
Neither of these techniques will hold up well under a lot of load - or even under any real concurrency. There are ways to solve for that, but I'll leave that for another day.