I ran into a problem. Through the below code I am trying to simplify several file/json objects in a large script.
Pointer.py
import json
class Pointers:
def __init__(self, target_file, mode, data):
self.target_file = target_file # file nameand path to load/store
self.data = data # data to load/store
self.mode = mode # mode on the data
# some other functions
# Writer object for non-json files
def sys_writer_4file(self):
with open(self.target_file, self.mode) as write_pointer:
handler = write_pointer.write(self.data)
write_pointer.close()
return handler
But when I try calling it from another script like below,
Report.py
from f_pointers import Pointer
class Something:
def someElse(self, url):
self.url = url
def someNonStaticFunction(self):
path = "./filepath/filename"
someData = data
Pointers.sys_writer_4file(("./filepath/filename", 'wb', somedata)
I get the unexpected argument warning from my interpreter saying,
This inspection reports discrepancies between declared parameters and
actual arguments, as well as incorrect arguments (e.g. duplicate named
arguments) and incorrect argument order. Decorators are analyzed, too.
On this line:
Pointers.sys_writer_4file(("./filepath/filename", 'wb', somedata)
Can someone advise me how I should create the object?
Thanks in advance.
You have to init your class first:
Pointers("./filepath/filename", 'wb', somedata).sys_writer_4file()
Related
I am working on a custom file path class, which should always execute a function
after the corresponding system file has been written to and its file object
closed. The function will upload the contents of file path to a remote location.
I want the upload functionality to happen entirely behind the scenes from a user
perspective, i.e. the user can use the class just like any other os.PathLike
class and automatically get the upload functionality. Psuedo code below for
refernce.
import os
class CustomPath(os.PathLike):
def __init__(self, remote_path: str):
self._local_path = "/some/local/path"
self._remote_path = remote_path
def __fspath__(self) -> str:
return self._local_path
def upload(self):
# Upload local path to remote path.
I can of course handle automatically calling the upload function for when the
user calls any of the methods directly.
However, it unclear to me how to automatically call the upload function if
someone writes to the file with the builtin open as follows.
custom_path = CustomPath("some remote location")
with open(custom_path, "w") as handle:
handle.write("Here is some text.")
or
custom_path = CustomPath("some remote location")
handle = open(custom_path, "w")
handle.write("Here is some text.")
handle.close()
I desire compatibility with invocations of the open function, so that the
upload behavior will work with all third party file writers. Is this kind of
behavior possible in Python?
Yes, it is possible with Python by making use of Python's function overriding, custom context manager and __ getattr __ facilities. Here's the basic logic:
override the builtins.open() function with custom open() class.
make it compatible with context manager using __ enter __ and __ exit__ methods.
make it compatible with normal read/write operations using __ getattr __ method.
call builtins method from the class whenever necessary.
invoke automatically callback function when close() method is called.
Here's the sample code:
import builtins
import os
to_be_monitered = ['from_file1.txt', 'from_file2.txt']
# callback function (called when file closes)
def upload(content_file):
# check for required file
if content_file in to_be_monitered:
# copy the contents
with builtins.open(content_file, 'r') as ff:
with builtins.open(remote_file, 'a') as tf:
# some logic for writing only new contents can be used here
tf.write('\n'+ff.read())
class open(object):
def __init__(self, path, mode):
self.path = path
self.mode = mode
# called when context manager invokes
def __enter__(self):
self.file = builtins.open(self.path, self.mode)
return self.file
# called when context manager returns
def __exit__(self, *args):
self.file.close()
# after closing calling upload()
upload(self.path)
return True
# called when normal non context manager invokes the object
def __getattr__(self, item):
self.file = builtins.open(self.path, self.mode)
# if close call upload()
if item == 'close':
upload(self.path)
return getattr(self.file, item)
if __name__ == '__main__':
remote_file = 'to_file.txt'
local_file1 = 'from_file1.txt'
local_file2 = 'from_file2.txt'
# just checks and creates remote file no related to actual problem
if not os.path.isfile(remote_file):
f = builtins.open(remote_file, 'w')
f.close()
# DRIVER CODE
# writing with context manger
with open(local_file1, 'w') as f:
f.write('some text written with context manager to file1')
# writing without context manger
f = open(local_file2, 'w')
f.write('some text written without using context manager to file2')
f.close()
# reading file
with open(remote_file, 'r') as f:
print('remote file contains:\n', f.read())
What does it do:
Writes "some text written with context manager to file1" to local_file1.txt and "some text written without context manager to file2" to local_file2.txt meanwhile copies these text to remote_file.txt automatically without copying explicitly.
How does it do:(context manager case)
with open(local_file1, 'w') as f: cretes an object of custom class open and initializes it's path and mode variables. And calls __ enter __ function(because of context manager(with as block)) which opens the file using builtins.open() method and returns the _io.TextIOWrapper (a opened text file object) object. It is a normal file object we can use it normally for read/write operations. After that context manger calls __ exit __ function at the end which(__ exit__) closess the file and calls required callback(here upload) function automatically and passes the file path just closed. In this callback function we can perform any operations like copying.
Non-context manger case also works similarly but the difference is __ getattr __ function is the one making magic.
Here's the contents of file's after the execution of code:
from_file1.txt
some text written with context manager to file1
from_file2.txt
some text written without using context manager to file2
to_file.txt
some text written with context manager to file1
some text written without using context manager to file2
Based on your comment to Girish Dattatray Hegde, it seems that what you would like to do is something like the following to override the default __exit__ handler for open:
import io
old_exit = io.FileIO.__exit__ # builtin __exit__ method
def upload(self):
print(self.read()) # just print out contents
def new_exit(self):
try:
upload(self)
finally:
old_exit(self) # invoke the builtin __exit__ method
io.FileIO.__exit__ = new_exit # establish our __exit__ method
with open('test.html') as f:
print(f.closed) # False
print(f.closed) # True
Unfortunately, the above code results in the following error:
test.py", line 18, in <module>
io.FileIO.__exit__ = new_exit # establish our __exit__ method
TypeError: can't set attributes of built-in/extension type '_io.FileIO'
So, I don't believe it is possible to do what you want to do. Ultimately you can create your own subclasses and override methods, but you cannot replace methods of the exiting builtin open class.
Suppose in "./data_writers/excel_data_writer.py", I have:
from generic_data_writer import GenericDataWriter
class ExcelDataWriter(GenericDataWriter):
def __init__(self, config):
super().__init__(config)
self.sheet_name = config.get('sheetname')
def write_data(self, pandas_dataframe):
pandas_dataframe.to_excel(
self.get_output_file_path_and_name(), # implemented in GenericDataWriter
sheet_name=self.sheet_name,
index=self.index)
In "./data_writers/csv_data_writer.py", I have:
from generic_data_writer import GenericDataWriter
class CSVDataWriter(GenericDataWriter):
def __init__(self, config):
super().__init__(config)
self.delimiter = config.get('delimiter')
self.encoding = config.get('encoding')
def write_data(self, pandas_dataframe):
pandas_dataframe.to_csv(
self.get_output_file_path_and_name(), # implemented in GenericDataWriter
sep=self.delimiter,
encoding=self.encoding,
index=self.index)
In "./datawriters/generic_data_writer.py", I have:
import os
class GenericDataWriter:
def __init__(self, config):
self.output_folder = config.get('output_folder')
self.output_file_name = config.get('output_file')
self.output_file_path_and_name = os.path.join(self.output_folder, self.output_file_name)
self.index = config.get('include_index') # whether to include index column from Pandas' dataframe in the output file
Suppose I have a JSON config file that has a key-value pair like this:
{
"__comment__": "Here, user can provide the path and python file name of the custom data writer module she wants to use."
"custom_data_writer_module": "./data_writers/excel_data_writer.py"
"there_are_more_key_value_pairs_in_this_JSON_config_file": "for other input parameters"
}
In "main.py", I want to import the data writer module based on the custom_data_writer_module provided in the JSON config file above. So I wrote this:
import os
import importlib
def main():
# Do other things to read and process data
data_writer_class_file = config.get('custom_data_writer_module')
data_writer_module = importlib.import_module\
(os.path.splitext(os.path.split(data_writer_class_file)[1])[0])
dw = data_writer_module.what_should_this_be? # <=== Here, what should I do to instantiate the right specific data writer (Excel or CSV) class instance?
for df in dataframes_to_write_to_output_file:
dw.write_data(df)
if __name__ == "__main__":
main()
As I asked in the code above, I want to know if there's a way to retrieve and instantiate the class defined in a Python module assuming that there is ONLY ONE class defined in the module. Or if there is a better way to refactor my code (using some sort of pattern) without changing the structure of JSON config file described above, I'd like to learn from Python experts on StackOverflow. Thank you in advance for your suggestions!
You can do this easily with vars:
cls1,=[v for k,v in vars(data_writer_module).items()
if isinstance(v,type)]
dw=cls1(config)
The comma enforces that exactly one class is found. If the module is allowed to do anything like from collections import deque (or even foo=str), you might need to filter based on v.__module__.
I want to test a Python function that reads a gzip file and extracts something from the file (using pytest).
import gzip
def my_function(file_path):
output = []
with gzip.open(file_path, 'rt') as f:
for line in f:
output.append('something from line')
return output
Can I create a gzip file like object that I can pass to my_function? The object should have defined content and should work with gzip.open()
I know that I can create a temporary gzip file in a fixture but this depends on the filesystem and other properties of the environment. Creating a file-like object from code would be more portable.
You can use the io and gzip libraries to create in-memory file objects. Example:
import io, gzip
def inmem():
stream = io.BytesIO()
with gzip.open(stream, 'wb') as f:
f.write(b'spam\neggs\n')
stream.seek(0)
return stream
You should never try to test outside code in a unit test. Only test the code you wrote. If you're testing gzip, then gzip is doing something wrong (they should be writing their own unit tests). Instead, do something like this:
from unittest import mock
#mock.Mock('gzip', return_value=b'<whatever you expect to be returned from gzip>')
def test_my_function(mock_gzip):
file_path = 'testpath'
output = my_function(file_path=file_path)
mock_gzip.open.assert_called_with(file_path)
assert output == b'<whatever you expect to be returned from your method>'
That's your whole unit test. All you want to know is that gzip.open() was called (and you assume it works or else gzip is failing and that's their problem) and that you got back what you expected from the method being tested. You specify what gzip returns based on what you expect it to return, but you don't actually call the function in your test.
It's a bit verbose but I'd do something like this (I have assumed that you saved my_function to a file called patch_one.py):
import patch_one # this is the file with my_function in it
from unittest.mock import patch
from unittest import TestCase
class MyTestCase(TestCase):
def test_my_function(self):
# because you used "with open(...) as f", we need a mock context
class MyContext:
def __enter__(self, *args, **kwargs):
return [1, 2] # note the two items
def __exit__(self, *args, **kwargs):
return None
# in case we want to know the arguments to open()
open_args = None
def f(*args, **kwargs):
def my_open(*args, **kwargs):
nonlocal open_args
open_args = args
return MyContext()
return my_open
# patch the gzip.open in our file under test
with patch('patch_one.gzip.open', new_callable=f):
# finally, we can call the function we want to test
ret_val = patch_one.my_function('not a real file path')
# note the two items, corresponding to the list in __enter__()
self.assertListEqual(['something from line', 'something from line'], ret_val)
# check the arguments, just for fun
self.assertEqual('rt', open_args[1])
If you want to try anything more complicated, I would recommend reading the unittest mock docs because how you import the "patch_one" file matters as does the string you pass to patch().
There will definitely be a way to do this with Mock or MagicMock but I find them a bit hard to debug so I went the long way round.
I am trying to writing a program to read a configuration file but while testing it am having this error:
self.connection_attempts = self.config_file.get('CONNECTION_ATTEMPTS', 'TIME')
AttributeError: 'list' object has no attribute 'get'
I ma pretty sure it is something I don't get, but it is few hours I am trying to understand where the problem is.
My __init__ method looks like this:
import simpleconfigparser
class ReportGenerator:
def __init__(self):
self.config_parser = simpleconfigparser.configparser()
self.config_file = config_parser.read('config.ini')
self.connection_attempts = config_file.get('CONNECTION_ATTEMPTS', 'TIME')
self.connection_timeout = config_file.get('CONNECTION_TIMEOUT', 'TIMEOUT')
self.report_destination_path = config_file.get('REPORT', 'REPORT_PATH')
This code uses the SimpleConfigParser package.
You want config_parser.get() not config_file.get(). config_parser.read() simply returns the list of config files successfully read after populating the config object. (Usually it is called config or cfg, not config_parser).
This list (config_file) serves no purpose in your code and you might as well not capture it at all.
from simpleconfigparser import simpleconfigparser
TIME = 5
TIMEOUT = 10
REPORT_PATH = '/tmp/'
class ReportGenerator:
def __init__(self):
self.config = simpleconfigparser()
config.read('config.ini')
self.connection_attempts = config.get('CONNECTION_ATTEMPTS', TIME)
self.connection_timeout = config.get('CONNECTION_TIMEOUT', TIMEOUT)
self.report_destination_path = config.get('REPORT', REPORT_PATH)
My guess would also be, that you use the default value in .get() the wrong way, but i cannot be certain with the information you have given.
This code is copy from http://code.google.com/p/closure-library/source/browse/trunk/closure/bin/build/source.py
The Source class's __str
__method referred self._path
Is it a special property for self?
Cuz, i couldn't find the place define this variable at Source Class
import re
_BASE_REGEX_STRING = '^\s*goog\.%s\(\s*[\'"](.+)[\'"]\s*\)'
_PROVIDE_REGEX = re.compile(_BASE_REGEX_STRING % 'provide')
_REQUIRES_REGEX = re.compile(_BASE_REGEX_STRING % 'require')
# This line identifies base.js and should match the line in that file.
_GOOG_BASE_LINE = (
'var goog = goog || {}; // Identifies this file as the Closure base.')
class Source(object):
"""Scans a JavaScript source for its provided and required namespaces."""
def __init__(self, source):
"""Initialize a source.
Args:
source: str, The JavaScript source.
"""
self.provides = set()
self.requires = set()
self._source = source
self._ScanSource()
def __str__(self):
return 'Source %s' % self._path #!!!!!! what is self_path !!!!
def GetSource(self):
"""Get the source as a string."""
return self._source
def _ScanSource(self):
"""Fill in provides and requires by scanning the source."""
# TODO: Strip source comments first, as these might be in a comment
# block. RegExes can be borrowed from other projects.
source = self.GetSource()
source_lines = source.splitlines()
for line in source_lines:
match = _PROVIDE_REGEX.match(line)
if match:
self.provides.add(match.group(1))
match = _REQUIRES_REGEX.match(line)
if match:
self.requires.add(match.group(1))
# Closure's base file implicitly provides 'goog'.
for line in source_lines:
if line == _GOOG_BASE_LINE:
if len(self.provides) or len(self.requires):
raise Exception(
'Base files should not provide or require namespaces.')
self.provides.add('goog')
def GetFileContents(path):
"""Get a file's contents as a string.
Args:
path: str, Path to file.
Returns:
str, Contents of file.
Raises:
IOError: An error occurred opening or reading the file.
"""
fileobj = open(path)
try:
return fileobj.read()
finally:
fileobj.close()
No, _path is just an attribute that may or me not be set on an object like any other attribute. The leading underscore simply means that the author felt it was an internal detail of the object and didn't want it regarded as part of the public interface.
In this particular case, unless something is setting the attribute from outside that source file, it looks like it's simply a mistake. It won't do any harm unless anyone ever tries to call str() on a Source object and probably nobody ever does.
BTW, you seem to be thinking there is something special about self. The name self isn't special in any way: it's a convention to use this name for the first parameter of a method, but it is just a name like any other that refers to the object being processed. So if you could access self._path without causing an error you could access it equally well through any other name for the object.