I've recently learned about unittest.monkey.patch and its variants, and I'd like to use it to unit test for atomicity of a file read function. However, the patch doesn't seem to have any effect.
Here's my set-up. The method under scrutiny is roughly like so (abriged):
#local_storage.py
def read(uri):
with open(path, "rb") as file_handle:
result = file_handle.read()
return result
And the module that performs the unit tests (also abriged):
#test/test_local_storage.py
import unittest.mock
import local_storage
def _read_while_writing(io_handle, size=-1):
""" The patch function, to replace io.RawIOBase.read. """
_write_something_to(TestLocalStorage._unsafe_target_file) #Appends "12".
result = io_handle.read(size) #Should call the actual read.
_write_something_to(TestLocalStorage._unsafe_target_file) #Appends "34".
class TestLocalStorage(unittest.TestCase):
_unsafe_target_file = "test.txt"
def test_read_atomicity(self):
with open(self._unsafe_target_file, "wb") as unsafe_file_handle:
unsafe_file_handle.write(b"Test")
with unittest.mock.patch("io.RawIOBase.read", _read_while_writing): # <--- This doesn't work!
result = local_storage.read(TestLocalStorage._unsafe_target_file) #The actual test.
self.assertIn(result, [b"Test", b"Test1234"], "Read is not atomic.")
This way, the patch should ensure that every time you try to read it, the file gets modified just before and just after the actual read, as if it happens concurrently, thus testing for atomicity of our read.
The unit test currently succeeds, but I've verified with print statements that the patch function doesn't actually get called, so the file never gets the additional writes (it just says "Test"). I've also modified the code as to be non-atomic on purpose.
So my question: How can I patch the read function of an IO handle inside the local_storage module? I've read elsewhere that people tend to replace the open() function to return something like a StringIO, but I don't see how that could fix this problem.
I need to support Python 3.4 and up.
I've finally found a solution myself.
The problem is that mock can't mock any methods of objects that are written in C. One of these is the RawIOBase that I was encountering.
So indeed the solution was to mock open to return a wrapper around RawIOBase. I couldn't get mock to produce a wrapper for me, so I implemented it myself.
There is one pre-defined file that's considered "unsafe". The wrapper writes to this "unsafe" file every time any call is made to the wrapper. This allows for testing the atomicity of file writes, since it writes additional things to the unsafe file while writing. My implementation prevents this by writing to a temporary ("safe") file and then moving that file over the target file.
The wrapper has a special case for the read function, because to test atomicity properly it needs to write to the file during the read. So it reads first halfway through the file, then stops and writes something, and then reads on. This solution is now semi-hardcoded (in how far is halfway), but I'll find a way to improve that.
You can see my solution here: https://github.com/Ghostkeeper/Luna/blob/0e88841d19737fb1f4606917f86e3de9b5b9f29b/plugins/storage/localstorage/test/test_local_storage.py
Related
Okay folks lemme illustrate, I've this
def get_config_file(file='scrapers_conf.json'):
"""
Load the default .json config file
"""
return json.load(open(file))
and this function is called a lot, This will be on a server and every request will trigger this function at least 5 times, I've multiple scrapers running, each one is on the following shape.
I removed helper methods for convenience, the problem is, each scraper should have it's own request headers, payload, ... or use the default ones that lie in scrapers_conf.json
class Scraper(threading.Thread): # init is overriden and has set .conf
def run(self):
self.get()
def get(self):
# logic
The problem is that I'm getting the headers like
class Scraper(threading.Thread):
def run(self):
self.get()
def get(self):
headers = self.conf.get('headers') or get_config_file().get('headers')
so as you see, each single instance on each single request calls the get_config_file() function which I don't think is optimal in my case. I know about lru_cache but I don't think it's the optimal solution (correct me please!)
The config files are small, os.sys.getsizeof reports under 1 KB.
I'm thinking of just leaving it as is considering that reading a 1 KB ain't a problem.
Thanks in advance.
lru_cache(maxsize=None) sounds like the right way to do this; the maxsize=None makes it faster by turning off the LRU machinery.
The other way would be to call get_config_file() at the beginning of the program (in __init__, get, or in the place that instantiates the class), assign it to an attribute on each Scraper class and then always refer to self.config (or whatever). That has the advantage that you can skip reading the config file in unit tests — you can pass a test config directly into the class.
In this case, since the class already has a self.conf, it might be best to update that dictionary with the values from the file, rather than referring to two places in each of the methods.
I've totally forgot about #functools.cached_property
#cached_property
def get_config_file(file='scrapers_conf.json'):
"""
Load the default .json config file
"""
return json.load(open(file))
I was wondering if there was any difference between doing:
var1 = open(filename, 'w').write("Hello world!")
and doing:
var1 = open(filename, 'w')
var1.write("Hello world!")
var1.close()
I find that there is no need (AttributeError) if I try to run close() after using the first method (all in one line).
I was wondering if one way was actually any different/'better' than the other, and secondly, what is Python actually doing here? I understand that open() returns a file object, but how come running all of the code in one line automatically closes the file too?
Using with statement is preferred way:
with open(filename, 'w') as f:
f.write("Hello world!")
It will ensure the file object is closed outside the with block.
Let me example to you why your first instance wont work if you initial a close() method. This will be useful for your future venture into learning object orientated programming in Python
Example 1
When you run open(filename, 'w') , it will initialise and return an file handle object.
When you call for open(filename, 'w').write('helloworld'), you are calling the write method on the file object that you initiated. Since the write method do not return any value/object, var1 in your code above will be of NoneType
Example 2
Now in your second example, you are storing the file object as var1.
var1 will have the write method as well as the close method and hence it will work.
This is in contrast to what you have done in your first example.
falsetru have provided a good example of how you can read and write file using the with statement
Reading and Writing file using the with statement
to write
with open(filename, 'w') as f:
f.write("helloworld")
to read
with open(filename) as f:
for line in f:
## do your stuff here
Using nested with statements to read/write multiple files at once
Hi here's an update to your question on the comments. Not too sure if this is the most pythonic way. But if you will like to use the with statement to read/write mulitple files at the same time using the with statement. What you can do is the nest the with statement within one another
For instance :
with open('a.txt', 'r') as a:
with open('b.txt', 'w') as b:
for line in a:
b.write(line)
How and Why
The file object itself is a iterator. Therefore, you could iterator thru the file with a for loop. The file object contains the next() method, which, with each iteration will be called until the end of file is reached.
The with statement was introduced in python 2.5. Prior to python 2.5 to achieve the same effect, one have to
f = open("hello.txt")
try:
for line in f:
print line,
finally:
f.close()
Now the with statement does that automatically for you. The try and finally statement are in place to ensure if there is any expection/error raised in the for loop, the file will be closed.
source : Python Built-in Documentation
Official documentations
Using the with statement, f.close() will be called automatically when it finishes. https://docs.python.org/2/tutorial/inputoutput.html
Happy venture into python
cheers,
biobirdman
#falsetru's answer is correct, in terms of telling you how you're "supposed" to open files. But you also asked what the difference was between the two approaches you tried, and why they do what they do.
The answer to those parts of your question is that the first approach doesn't do what you probably think it does. The following code
var1 = open(filename, 'w').write("Hello world!")
is roughly equivalent to
tmp = open(filename, 'w')
var1 = tmp.write("Hello world!")
del tmp
Notice that the open() function returns a file object, and that file object has a write() method. But write() doesn't have any return value, so var1 winds up being None. From the official documentation for file.write(str):
Write a string to the file. There is no return value. Due to buffering, the string may not actually show up in the file until the flush() or close() method is called.
Now, the reason you don't need to close() is that the main implementation of Python (the one found at python.org, also called CPython) happens to garbage-collect objects that no longer have references to them, and in your one-line version, you don't have any reference to the file object once the statement completes. You'll find that your multiline version also doesn't strictly need the close(), since all references will be cleaned up when the interpreter exits. But see answers to this question for a more detailed explanation about close() and why it's still a good idea to use it, unless you're using with instead.
Here's the code
import csv
def csv_dict_reader(file_obj):
"""
read a CSV file using csv.DictReader
"""
reader = csv.DictReader(file_obj, delimiter=',')
for line in reader:
print(line['first_name']),
print(line['last_name']),
if __name__== "__main__":
with open("dummy.csv") as f_obj:
csv_dict_reader(f_obj)
I wanted to try and do a quick breakdown, to see if I understand how exactly this works. Here we go:
1) import csv brings in the csv method
2) We define a function, which takes 'file_obj' as its argument
3) the reader variable makes a call to a function within csv called "DictReadre", which subsequently takes arguments from 'file_obj' and specifies a 'delimiter'
4) I get confused with this for loop, why is that we don't have to define line beforehand? Is it that line is already defined as part of 'reader'?
5) I'm really confused when it comes to 'name' and 'main', are these somehow related to how we specify a 'file_obj'? I'm equally confused with how we end up specifying the 'file_obj' in the end; I've been assuming 'f_obj' somehow manages to fill this role.
--edit--
Awesome, this is starting to make a whole lot more sense to me. So, when I make a 'class' call to DictReader(), I'm creating an instance of it in the variable 'reader'?
Maybe I'm going too far off the beaten path, but what in the DictReader() class allows for it to determine the structure of fields like 'last_name' or 'first_name'? I'm assuming it has something to do with how CSV files are structures, but I'm not entirely certain.
1) import csv brings in the csv method
Well, not quite; it brings in the csv module.*
* … which includes the csv.DictReader class, which has a csv.DictReader.__next__ method that you call implicitly, but that's not important here.
2) We define a function, which takes 'file_obj' as its argument
Exactly.*
* Technically, there's a distinction between arguments and parameters, or between actual vs. formal arguments/parameters. You probably don't want to learn that yet. But if you do, formal parameters go in function definitions; actual arguments go in function calls.
3) the reader variable makes a call to a function within csv called "DictReadre", which subsequently takes arguments from 'file_obj' and specifies a 'delimiter'
Again, not quite; it makes a call to the class DictReader. Calling a class constructs an instance of that class. Arguments are passed the same way as in a function call.* You can see the parameters that DictReader takes by looking it up in the help.
* In fact, constructing a class actually calls the class's __new__ method, and then (usually) its __init__ method. But that's only important when you're writing new classes; when you're just using classes, you don't care about __new__ or __init__. That's why the documentation shows, e.g., class csv.DictReader(csvfile, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds).
4) I get confused with this for loop, why is that we don't have to define line beforehand? Is it that line is already defined as part of 'reader'?
No, that's exactly what for statements do: each time through the loop, line gets assigned to the next value in reader. The tutorial explains in more detail.
A simpler example may help:
for a in [1, 2, 3]:
print(a)
This assigns 1 to a, prints out that 1, then assigns 2 to a, prints out that 2, then assigns 3 to a, prints out that 3, then it's done.
Also, you may be confused by other languages, which need variables to be declared before they can be used. Python doesn't do that; you can assign to any name you want anywhere you want, and if there wasn't a variable with that name, there is now.
5) I'm really confused when it comes to 'name' and 'main'
This is a special case where you have to learn something reasonably advanced a little early.
The same source code file can be used as a script, to run on the command line, and also as a module, to be imported by other code. The way you distinguish between the two is by checking __name__. If you're being run as a script, it will be '__main__'. If you're being used as a module by some other script, it will be whatever the name of your module is.
So, idiomatically, you define all your public classes and functions and constants that might be useful to someone else, then you do if __name__ == '__main__': and put all the "top-level script" code there that you want to execute if someone runs you as a script.
Again, the tutorial explains in more detail.
I really like the syntax "with open('in_file'') as f".
I want to use that syntax for my own resources which must be opened and closed.
However, I do not understand how to change my open() method to enable the 'with' syntax.
I can (and do) use the contextlib.closing() approach but it becomes a bit bulky after repeated use.
So, I will ask my question below in relation to shelve.open().
I am not proposing a change to the shelve module but instead am using it because the source code is readily available to all of you.
There is nothing special about shelve.open() vs. other standard library resources that require closing: socket.socket(), sqlite3.connect(), urllib2.urlopen(), etc.
import contextlib, inspect, shelve, sys
#print(inspect.getsource(open)) # can not see how it was done here :-(
print('-' * 40)
# Given that we can view the source for the shelve module:
print(inspect.getsource(shelve))
print('-' * 40)
# Given that we can view the docs for the shelve module:
print(shelve.__doc__)
#print('-' * 40)
# Given that the desired syntax is Pythonic but is not supported:
#with shelve.open('test_shelve') as my_shelve:
# my_shelve['fact_number_1'] = "There's a dead fish on the landing."
# Given that the required syntax is convoluted and
# takes programmer attention away from the task at hand:
with contextlib.closing(shelve.open('test_shelve')) as my_shelve:
my_shelve['fact_number_2'] = "There's another dead fish on the landing."
# Q: What changes would need to made to shelve.open() to allow the
# 'with shelve.open(x) as y' syntax?
I am not really interested in an extra wrapper with a different name. Using contextlib.closing() is easier, safer, and more intuitive than that. What I am really interested in is creating a single open() method that can be called either with or without 'with'.
So, to successfully answer this question, you need to take the source code for the shelve module and show what changes would need to be made to shelve.open() to have a single method that can be used either with or without 'with' (like the builtin open() or the Python3 urllib.urlopen()).
The biggest problem here is that if you do
shelf = the_function_you_want()
the function you want has to return the shelf, but if you do
with the_function_you_want() as shelf:
the function you want has to return a context manager. That means you need to return a shelf that is also a context manager, which in turn means you either need to make a shelf subclass or monkey-patch Shelf. It's probably better to make a subclass:
class ContextManagerShelf(shelve.DbfilenameShelf):
def __enter__(self):
return self
def __exit__(self, *exc_info):
self.close()
Then you can use ContextManagerShelf as a context manager or not. The signature is the same as shelve.open. If you want, you can also make an open function to go with it.
I have to open a file-like object in python (it's a serial connection through /dev/) and then close it. This is done several times in several methods of my class. How I WAS doing it was opening the file in the constructor, and then closing it in the destructor. I'm getting weird errors though and I think it has to do with the garbage collector and such, I'm still not used to not knowing exactly when my objects are being deleted =\
The reason I was doing this is because I have to use tcsetattr with a bunch of parameters each time I open it and it gets annoying doing all that all over the place. So I want to implement an inner class to handle all that so I can use it doing
with Meter('/dev/ttyS2') as m:
I was looking online and I couldn't find a really good answer on how the with syntax is implemented. I saw that it uses the __enter__(self) and __exit(self)__ methods. But is all I have to do implement those methods and I can use the with syntax? Or is there more to it?
Is there either an example on how to do this or some documentation on how it's implemented on file objects already that I can look at?
Those methods are pretty much all you need for making the object work with with statement.
In __enter__ you have to return the file object after opening it and setting it up.
In __exit__ you have to close the file object. The code for writing to it will be in the with statement body.
class Meter():
def __init__(self, dev):
self.dev = dev
def __enter__(self):
#ttysetattr etc goes here before opening and returning the file object
self.fd = open(self.dev, MODE)
return self
def __exit__(self, type, value, traceback):
#Exception handling here
close(self.fd)
meter = Meter('dev/tty0')
with meter as m:
#here you work with the file object.
m.fd.read()
Easiest may be to use standard Python library module contextlib:
import contextlib
#contextlib.contextmanager
def themeter(name):
theobj = Meter(name)
try:
yield theobj
finally:
theobj.close() # or whatever you need to do at exit
# usage
with themeter('/dev/ttyS2') as m:
# do what you need with m
m.read()
This doesn't make Meter itself a context manager (and therefore is non-invasive to that class), but rather "decorates" it (not in the sense of Python's "decorator syntax", but rather almost, but not quite, in the sense of the decorator design pattern;-) with a factory function themeter which is a context manager (which the contextlib.contextmanager decorator builds from the "single-yield" generator function you write) -- this makes it so much easier to separate the entering and exiting condition, avoids nesting, &c.
The first Google hit (for me) explains it simply enough:
http://effbot.org/zone/python-with-statement.htm
and the PEP explains it more precisely (but also more verbosely):
http://www.python.org/dev/peps/pep-0343/