Why can you use open() as context manager? - python

From Python's source code of open, I think open is just a normal function.
Why can we use it like below?
with open('what_are_context_managers.txt', 'r') as infile:
for line in infile:
print('> {}'.format(line))
Since is neither implements __enter__ nor __exit__, nor uses contextlib.contextmanager decorator.

You are not using the open function as a context manager. It is the result of the open(...) call expression that is the context manager. open() returns a file object, and it is that object that has __enter__ and __exit__ methods; see the io.IOBase documentation:
IOBase is also a context manager and therefore supports the with statement.
You can read the with statement like this:
_context_manager = open('what_are_context_managers.txt', 'r')
with _context_manager as infile:
Note that it is the return value of _context_manager.__enter__() that ends up being assigned to infile here. For file objects, file.__enter__() returns self, so you can get access to the same object that way.
As a side-note; you got the wrong open() function. The actual definition of the open() built-in is an alias for io.open(), see the _iomodule.c source code. The alias is set in initstdio() in pylifecycle.c (where io.OpenWrapper is itself an alias for _io.open). And yes, the documentation states the alias points the other way for end-user ease.

Related

Mocking "with open()"

I am trying to unit test a method that reads the lines from a file and process it.
with open([file_name], 'r') as file_list:
for line in file_list:
# Do stuff
I tried several ways described on another questions but none of them seems to work for this case. I don't quite understand how python uses the file object as an iterable on the lines, it internally use file_list.readlines() ?
This way didn't work:
with mock.patch('[module_name].open') as mocked_open: # also tried with __builtin__ instead of module_name
mocked_open.return_value = 'line1\nline2'
I got an
AttributeError: __exit__
Maybe because the with statement have this special attribute to close the file?
This code makes file_list a MagicMock. How do I store data on this MagicMock to iterate over it ?
with mock.patch("__builtin__.open", mock.mock_open(read_data="data")) as mock_file:
Best regards
The return value of mock_open (until Python 3.7.1) doesn't provide a working __iter__ method, which may make it unsuitable for testing code that iterates over an open file object.
Instead, I recommend refactoring your code to take an already opened file-like object. That is, instead of
def some_method(file_name):
with open([file_name], 'r') as file_list:
for line in file_list:
# Do stuff
...
some_method(file_name)
write it as
def some_method(file_obj):
for line in file_obj:
# Do stuff
...
with open(file_name, 'r') as file_obj:
some_method(file_obj)
This turns a function that has to perform IO into a pure(r) function that simply iterates over any file-like object. To test it, you don't need to mock open or hit the file system in any way; just create a StringIO object to use as the argument:
def test_it(self):
f = StringIO.StringIO("line1\nline2\n")
some_method(f)
(If you still feel the need to write and test a wrapper like
def some_wrapper(file_name):
with open(file_name, 'r') as file_obj:
some_method(file_obj)
note that you don't need the mocked open to do anything in particular. You test some_method separately, so the only thing you need to do to test some_wrapper is verify that the return value of open is passed to some_method. open, in this case, can be a plain old mock with no special behavior.)

Which objects the with statement applies to? [duplicate]

I'm trying to understand if there is there a difference between these, and what that difference might be.
Option One:
file_obj = open('test.txt', 'r')
with file_obj as in_file:
print in_file.readlines()
Option Two:
with open('test.txt', 'r') as in_file:
print in_file.readlines()
I understand that with Option One, the file_obj is in a closed state after the with block.
I don't know why no one has mentioned this yet, because it's fundamental to the way with works. As with many language features in Python, with behind the scenes calls special methods, which are already defined for built-in Python objects and can be overridden by user-defined classes. In with's particular case (and context managers more generally), the methods are __enter__ and __exit__.
Remember that in Python everything is an object -- even literals. This is why you can do things like 'hello'[0]. Thus, it does not matter whether you use the file object directly as returned by open:
with open('filename.txt') as infile:
for line in infile:
print(line)
or store it first with a different name (for example to break up a long line):
the_file = open('filename' + some_var + '.txt')
with the_file as infile:
for line in infile:
print(line)
Because the end result is that the_file, infile, and the return value of open all point to the same object, and that's what with is calling the __enter__ and __exit__ methods on. The built-in file object's __exit__ method is what closes the file.
These behave identically. As a general rule, the meaning of Python code is not changed by assigning an expression to a variable in the same scope.
This is the same reason that these are identical:
f = open("myfile.txt")
vs
filename = "myfile.txt"
f = open(filename)
Regardless of whether you add an alias, the meaning of the code stays the same. The context manager has a deeper meaning than passing an argument to a function, but the principle is the same: the context manager magic is applied to the same object, and the file gets closed in both cases.
The only reason to choose one over the other is if you feel it helps code clarity or style.
There is no difference between the two - either way the file is closed when you exit the with block.
The second example you give is the typical way the files are used in Python 2.6 and newer (when the with syntax was added).
You can verify that the first example also works in a REPL session like this:
>>> file_obj = open('test.txt', 'r')
>>> file_obj.closed
False
>>> with file_obj as in_file:
... print in_file.readlines()
<Output>
>>> file_obj.closed
True
So after the with blocks exits, the file is closed.
Normally the second example is how you would do this sort of thing, though.
There's no reason to create that extra variable file_obj... anything that you might want to do with it after the end of the with block you could just use in_file for, because it's still in scope.
>>> in_file
<closed file 'test.txt', mode 'r' at 0x03DC5020>
If you just fire up Python and use either of those options, the net effect is the same if the base instance of Python's file object is not changed. (In Option One, the file is only closed when file_obj goes out of scope vs at the end of the block in Option Two as you have already observed.)
There can be differences with use cases with a context manager however. Since file is an object, you can modify it or subclass it.
You can also open a file by just calling file(file_name) showing that file acts like other objects (but no one opens files that way in Python unless it is with with):
>>> f=open('a.txt')
>>> f
<open file 'a.txt', mode 'r' at 0x1064b5ae0>
>>> f.close()
>>> f=file('a.txt')
>>> f
<open file 'a.txt', mode 'r' at 0x1064b5b70>
>>> f.close()
More generally, the opening and closing of some resource called the_thing (commonly a file, but can be anything) you follow these steps:
set up the_thing # resource specific, open, or call the obj
try # generically __enter__
yield pieces from the_thing
except
react if the_thing is broken
finally, put the_thing away # generically __exit__
You can more easily change the flow of those subelements using the context manager vs procedural code woven between open and the other elements of the code.
Since Python 2.5, file objects have __enter__ and __exit__ methods:
>>> f=open('a.txt')
>>> f.__enter__
<built-in method __enter__ of file object at 0x10f836780>
>>> f.__exit__
<built-in method __exit__ of file object at 0x10f836780>
The default Python file object uses those methods in this fashion:
__init__(...) # performs initialization desired
__enter__() -> self # in the case of `file()` return an open file handle
__exit__(*excinfo) -> None. # in the case of `file()` closes the file.
These methods can be changed for your own use to modify how a resource is treated when it is opened or closed. A context manager makes it really easy to modify what happens when you open or close a file.
Trivial example:
class Myopen(object):
def __init__(self, fn, opening='', closing='', mode='r', buffering=-1):
# set up the_thing
if opening:
print(opening)
self.closing=closing
self.f=open(fn, mode, buffering)
def __enter__(self):
# set up the_thing
# could lock the resource here
return self.f
def __exit__(self, exc_type, exc_value, traceback):
# put the_thing away
# unlock, or whatever context applicable put away the_thing requires
self.f.close()
if self.closing:
print(self.closing)
Now try that:
>>> with Myopen('a.txt', opening='Hello', closing='Good Night') as f:
... print f.read()
...
Hello
[contents of the file 'a.txt']
Good Night
Once you have control of entry and exit to a resource, there are many use cases:
Lock a resource to access it and use it; unlock when you are done
Make a quirky resource (like a memory file, database or web page) act more like a straight file resource
Open a database and rollback if there is an exception but commit all writes if there are no errors
Temporarily change the context of a floating point calculation
Time a piece of code
Change the exceptions that you raise by returning True or False from the __exit__ method.
You can read more examples in PEP 343.
Is remarkable that with works even if return or sys.exit() is called inside (that means __exit__ is called anyway):
#!/usr/bin/env python
import sys
class MyClass:
def __enter__(self):
print("Enter")
return self
def __exit__(self, type, value, trace):
print("type: {} | value: {} | trace: {}".format(type,value,trace))
# main code:
def myfunc(msg):
with MyClass() as sample:
print(msg)
# also works if uncomment this:
# sys.exit(0)
return
myfunc("Hello")
return version will show:
Enter
Hello
type: None | value: None | trace: None
exit(0) version will show:
Enter
Hello
type: <class 'SystemExit'> | value: 0 | trace: <traceback object at 0x7faca83a7e00>

with and closing of files in Python

I have read, that file opened like this is closed automatically when leaving the with block:
with open("x.txt") as f:
data = f.read()
do something with data
yet when opening from web, I need this:
from contextlib import closing
from urllib.request import urlopen
with closing(urlopen('http://www.python.org')) as page:
for line in page:
print(line)
why and what is the difference? (I am using Python3)
The details get a little technical, so let's start with the simple version:
Some types know how to be used in a with statement. File objects, like what you get back from open, are an example of such a type. As it turns out, the objects that you get back from urllib.request.urlopen, are also an example of such a type, so your second example could be written the same way as the first.
But some types don't know how to be used in a with statement. The closing function is designed to wrap such types—as long as they have a close method, it will call their close method when you exit the with statement.
Of course some types don't know how to be used in a with statement, and also can't be used with closing because their cleanup method isn't named close (or because cleaning them up is more complicated than just closing them). In that case, you need to write a custom context manager. But even that isn't usually that hard.
In technical terms:
A with statement requires a context manager, an object with __enter__ and __exit__ methods. It will call the __enter__ method, and give you the value returned by that method in the as clause, and it will then call the __exit__ method at the end of the with statement.
File objects inherit from io.IOBase, which is a context manager whose __enter__ method returns itself, and whose __exit__ calls self.close().
The object returned by urlopen is (assuming an http or https URL) an HTTPResponse, which, as the docs say, "can be used with a with statement".
The closing function:
Return a context manager that closes thing upon completion of the block. This is basically equivalent to:
#contextmanager
def closing(thing):
try:
yield thing
finally:
thing.close()
It's not always 100% clear in the docs which types are context managers and which types aren't. Especially since there's been a major drive since 3.1 to make everything that could be a context manager into one (and, for that matter, to make everything that's mostly-file-like into an actual IOBase if it makes sense), but it's still not 100% complete as of 3.4.
You can always just try it and see. If you get an AttributeError: __exit__, then the object isn't usable as a context manager. If you think it should be, file a bug suggesting the change. If you don't get that error, but the docs don't mention that it's legal, file a bug suggesting the docs be updated.
You don't. urlopen('http://www.python.org') returns a context manager too:
with urlopen('http://www.python.org') as page:
This is documented on the urllib.request.urlopen() page:
For ftp, file, and data urls and requests explicity handled by legacy URLopener and FancyURLopener classes, this function returns a urllib.response.addinfourl object which can work as context manager [...].
Emphasis mine. For HTTP responses, http.client.HTTPResponse() object is returned, which also is a context manager:
The response is an iterable object and can be used in a with statement.
The Examples section also uses the object as a context manager:
As the python.org website uses utf-8 encoding as specified in it’s meta tag, we will use the same for decoding the bytes object.
>>> with urllib.request.urlopen('http://www.python.org/') as f:
... print(f.read(100).decode('utf-8'))
...
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtm
Objects returned by open() are context managers too; they implement the special methods object.__enter__() and object.__exit__().
The contextlib.closing() documentation uses an example with urlopen() that is out of date; in Python 2 the predecessor for urllib.request.urlopen() did not produce a context manager and you needed to use that tool to auto-close the connection with a context manager. This was fixed with issues 5418 and 12365, but that example was not updated. I created issue 22755 asking for a different example.

What's the difference between io.open() and os.open() on Python?

I realised that the open() function I've been using was an alias to io.open() and that importing * from os would overshadow that.
What's the difference between opening files through the io module and os module?
io.open() is the preferred, higher-level interface to file I/O. It wraps the OS-level file descriptor in an object that you can use to access the file in a Pythonic manner.
os.open() is just a wrapper for the lower-level POSIX syscall. It takes less symbolic (and more POSIX-y) arguments, and returns the file descriptor (a number) that represents the opened file. It does not return a file object; the returned value will not have read() or write() methods.
From the os.open() documentation:
This function is intended for low-level I/O. For normal usage, use the built-in function open(), which returns a “file object” with read() and write() methods (and many more).
Absolutely everything:
os.open() takes a filename as a string, the file mode as a bitwise mask of attributes, and an optional argument that describes the file permission bits, and returns a file descriptor as an integer.
io.open() takes a filename as a string or a file descriptor as an integer, the file mode as a string, and optional arguments that describe the file encoding, buffering used, how encoding errors and newlines are handled, and if the underlying FD is closed when the file is closed, and returns some descendant of io.IOBase.
os.open is very similar to open() from C in Unix. You're unlikely to want to use it unless you're doing something much more low-level. It gives you an actual file descriptor (as in, a number, not an object).
io.open is your basic Python open() and what you want to use just about all the time.
To add to the existing answers:
I realised that the open() function I've been using was an alias to io.open()
open() == io.open() in Python 3 only. In Python 2 they are different.
While with open() in Python we can obtain an easy-to-use file object with handy read() and write() methods, on the OS level files are accessed using file descriptors (or file handles in Windows). Thus, os.open() should be used implicitly under the hood. I haven't examined Python source code in this regard, but the documentation for the opener parameter, which was added for open() in Python 3.3, says:
A custom opener can be used by passing a callable as opener. The
underlying file descriptor for the file object is then obtained by
calling opener with (file, flags). opener must return an open file
descriptor (passing os.open as opener results in functionality similar
to passing None).
So os.open() is the default opener for open(), and we also have the ability to specify a custom wrapper around it if file flags or mode need to be changed. See the documentation for open() for an example of a custom opener, which opens a file relative to a given directory.
In Python 2, the built-in open and io.open were different (io.open was newer and supported more things). In Python 3, open and io.open are now the same thing (they got rid of the old built-in open), so you should always use open. Code that needs to be compatible with Python 2 and 3 might have a reason to use io.open.
Below code to validate this.
import io
with io.open("../input/files.txt") as f:
text = f.read().lower()
with open('../input/files.txt', encoding='utf-8') as f2:
text2 = f2.read().lower()
print(type(f))
print(type(f2))
# <class '_io.TextIOWrapper'>
# <class '_io.TextIOWrapper'>
Database and system application developers usually use open instead of fopen as the former provides finer control on when, what and how the memory content should be written to its backing store (i.e., file on disk).
In Unix-like operating system, open is used to open regular file, socket end-point, device, pipe, etc. A positive file descriptor number is returned for every successful open function call. It provides a consistent API and framework to check for event notification, etc on a variety of these objects.
However, fopen is a standard C function and is normally used to open regular file and return a FILE data structure. fopen, actually, will call open eventually. fopen is good enough for normal usage as developers do not need to worry when to flush or sync memory content to the disk and do not need event notification.
os.open() method opens the file file and set various flags according to flags and possibly its mode according to mode.
The default mode is 0777 (octal), and the current unmask value is first masked out.
This method returns the file descriptor for the newly opened file.
While,
io.open() method opens a file, in the mode specified in the string mode. It returns a new file handle, or, in case of errors, nil plus an error message.
Hope this helps

How do you check if an object is an instance of 'file'?

It used to be in Python (2.6) that one could ask:
isinstance(f, file)
but in Python 3.0 file was removed.
What is the proper method for checking to see if a variable is a file now? The What'sNew docs don't mention this...
def read_a_file(f)
try:
contents = f.read()
except AttributeError:
# f is not a file
substitute whatever methods you plan to use for read. This is optimal if you expect that you will get passed a file like object more than 98% of the time. If you expect that you will be passed a non file like object more often than 2% of the time, then the correct thing to do is:
def read_a_file(f):
if hasattr(f, 'read'):
contents = f.read()
else:
# f is not a file
This is exactly what you would do if you did have access to a file class to test against. (and FWIW, I too have file on 2.6) Note that this code works in 3.x as well.
In python3 you could refer to io instead of file and write
import io
isinstance(f, io.IOBase)
Typically, you don't need to check an object type, you could use duck-typing instead i.e., just call f.read() directly and allow the possible exceptions to propagate -- it is either a bug in your code or a bug in the caller code e.g., json.load() raises AttributeError if you give it an object that has no read attribute.
If you need to distinguish between several acceptable input types; you could use hasattr/getattr:
def read(file_or_filename):
readfile = getattr(file_or_filename, 'read', None)
if readfile is not None: # got file
return readfile()
with open(file_or_filename) as file: # got filename
return file.read()
If you want to support a case when file_of_filename may have read attribute that is set to None then you could use try/except over file_or_filename.read -- note: no parens, the call is not made -- e.g., ElementTree._get_writer().
If you want to check certain guarantees e.g., that only one single system call is made (io.RawIOBase.read(n) for n > 0) or there are no short writes (io.BufferedIOBase.write()) or whether read/write methods accept text data (io.TextIOBase) then you could use isinstance() function with ABCs defined in io module e.g., look at how saxutils._gettextwriter() is implemented.
Works for me on python 2.6... Are you in a strange environment where builtins aren't imported by default, or where somebody has done del file, or something?

Categories