with and closing of files in Python - python

I have read, that file opened like this is closed automatically when leaving the with block:
with open("x.txt") as f:
data = f.read()
do something with data
yet when opening from web, I need this:
from contextlib import closing
from urllib.request import urlopen
with closing(urlopen('http://www.python.org')) as page:
for line in page:
print(line)
why and what is the difference? (I am using Python3)

The details get a little technical, so let's start with the simple version:
Some types know how to be used in a with statement. File objects, like what you get back from open, are an example of such a type. As it turns out, the objects that you get back from urllib.request.urlopen, are also an example of such a type, so your second example could be written the same way as the first.
But some types don't know how to be used in a with statement. The closing function is designed to wrap such types—as long as they have a close method, it will call their close method when you exit the with statement.
Of course some types don't know how to be used in a with statement, and also can't be used with closing because their cleanup method isn't named close (or because cleaning them up is more complicated than just closing them). In that case, you need to write a custom context manager. But even that isn't usually that hard.
In technical terms:
A with statement requires a context manager, an object with __enter__ and __exit__ methods. It will call the __enter__ method, and give you the value returned by that method in the as clause, and it will then call the __exit__ method at the end of the with statement.
File objects inherit from io.IOBase, which is a context manager whose __enter__ method returns itself, and whose __exit__ calls self.close().
The object returned by urlopen is (assuming an http or https URL) an HTTPResponse, which, as the docs say, "can be used with a with statement".
The closing function:
Return a context manager that closes thing upon completion of the block. This is basically equivalent to:
#contextmanager
def closing(thing):
try:
yield thing
finally:
thing.close()
It's not always 100% clear in the docs which types are context managers and which types aren't. Especially since there's been a major drive since 3.1 to make everything that could be a context manager into one (and, for that matter, to make everything that's mostly-file-like into an actual IOBase if it makes sense), but it's still not 100% complete as of 3.4.
You can always just try it and see. If you get an AttributeError: __exit__, then the object isn't usable as a context manager. If you think it should be, file a bug suggesting the change. If you don't get that error, but the docs don't mention that it's legal, file a bug suggesting the docs be updated.

You don't. urlopen('http://www.python.org') returns a context manager too:
with urlopen('http://www.python.org') as page:
This is documented on the urllib.request.urlopen() page:
For ftp, file, and data urls and requests explicity handled by legacy URLopener and FancyURLopener classes, this function returns a urllib.response.addinfourl object which can work as context manager [...].
Emphasis mine. For HTTP responses, http.client.HTTPResponse() object is returned, which also is a context manager:
The response is an iterable object and can be used in a with statement.
The Examples section also uses the object as a context manager:
As the python.org website uses utf-8 encoding as specified in it’s meta tag, we will use the same for decoding the bytes object.
>>> with urllib.request.urlopen('http://www.python.org/') as f:
... print(f.read(100).decode('utf-8'))
...
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtm
Objects returned by open() are context managers too; they implement the special methods object.__enter__() and object.__exit__().
The contextlib.closing() documentation uses an example with urlopen() that is out of date; in Python 2 the predecessor for urllib.request.urlopen() did not produce a context manager and you needed to use that tool to auto-close the connection with a context manager. This was fixed with issues 5418 and 12365, but that example was not updated. I created issue 22755 asking for a different example.

Related

Which python module contains file object methods?

While it is simple to search by using help for most methods that have a clear help(module.method) arrangement, for example help(list.extend), I cannot work out how to look up the method .readline() in python's inbuilt help function.
Which module does .readline belong to? How would I search in help for .readline and related methods?
Furthermore is there any way I can use the interpreter to find out which module a method belongs to in future?
Don't try to find the module. Make an instance of the class you want, then call help on the method of that instance, and it will find the correct help info for you. Example:
>>> f = open('pathtosomefile')
>>> help(f.readline)
Help on built-in function readline:
readline(size=-1, /) method of _io.TextIOWrapper instance
Read until newline or EOF.
Returns an empty string if EOF is hit immediately.
In my case (Python 3.7.1), it's defined on the type _io.TextIOWrapper (exposed publicly as io.TextIOWrapper, but help doesn't know that), but memorizing that sort of thing isn't very helpful. Knowing how to figure it out by introspecting the specific thing you care about is much more broadly applicable. In this particular case, it's extra important not to try guessing, because the open function can return a few different classes, each with different methods, depending on the arguments provided, including io.BufferedReader, io.BufferedWriter, io.BufferedRandom, and io.FileIO, each with their own version of the readline method (though they all share a similar interface for consistency's sake).
From the text of help(open):
open() returns a file object whose type depends on the mode, and
through which the standard file operations such as reading and writing
are performed. When open() is used to open a file in a text mode ('w',
'r', 'wt', 'rt', etc.), it returns a TextIOWrapper. When used to open
a file in a binary mode, the returned class varies: in read binary
mode, it returns a BufferedReader; in write binary and append binary
modes, it returns a BufferedWriter, and in read/write mode, it returns
a BufferedRandom.
See also the section of python's io module documentation on the class hierarchy.
So you're looking at TextIOWrapper, BufferedReader, BufferedWriter, or BufferedRandom. These all have their own sets of class hierarchies, but suffice it to say that they share the IOBase superclass at some point - that's where the functions readline() and readlines() are declared. Of course, each subclass implements these functions differently for its particular mode - if you do
help(_io.TextIOWrapper.readline)
you should get the documentation you're looking for.
In particular, you're having trouble accessing the documentation for whichever version of readline you need, because you can't be bothered to figure out which class it is. You can actually call help on an object as well. If you're working with a particular file object, then you can spin up a terminal, instantiate it, and then just pass it to help() and it'll show you whatever interface is closest to the surface. Example:
x = open('some_file.txt', 'r')
help(x.readline)

How does open handle context management?

The python built-ins open, and file work with context managers in a way that I don't quite understand.
It is to my understanding that open will create a file. file implements the context-manager methods __enter__ and __exit__. I would initially expect __enter__ to implement the actual opening of the file descriptor.
However, using open outside of a with block will return a file which is already open. So, it appears either file.__init__ or open is actually opening the file descriptor, and as far as I can tell file.__enter__ isn't doing anything. Or maybe file.__init__/open calls file.__enter__ directly?
First question:
What is the execution-flow of the open built-in? What does open handle, what does file.__init__ handle, and what does file.__enter__ handle? How does this work when re-using one file object for multiple cycles of opening/closing the file? How is this different from re-using other contextmanager objects for multiple context-cycles?
Second question:
Objects such as file objects have a setup steps and teardown steps. The setup occurs in __init__ , and the tear-down occurs in either close or __exit__.
Is this a good design pattern? Should this design pattern be implemented for custom functions/context managers?
If you look in _pyio.py (a pure-Python implementation of the io module) you find the following code in class IOBase:
### Context manager ###
def __enter__(self): # That's a forward reference
"""Context management protocol. Returns self (an instance of IOBase)."""
self._checkClosed()
return self
def __exit__(self, *args):
"""Context management protocol. Calls close()"""
self.close()
This contains the answers to most of your questions. The important thing to understand is that the context manager's function is to insure that you close the file when you are done with it. It does this simply by calling the close function, which saves you the trouble of doing so.
What does file.__enter__ handle? Nothing. It simply returns to you the file object that was the result of the call to the built-in function open().
How does this work when using one file object for multiple cycles of opening and closing the file? The context manager is not very useful for that purpose, since you must explicitly open the file each time.
Is this a good design pattern? Yes, because it reduces the amount of code you have to write, it's easy to read and understand.
Should this pattern be implemented for custom functions/context managers? Any time you have an object that needs to be cleaned up, or has usage that involves some type of open/close concept, you should consider this pattern. The standard library has many other examples.
For Question 1
In CPython, open() does nothing but creating a file object, which the underlying C type is PyFileObject; See source code in bltinmodule.c and fileobject.c
static PyObject *
builtin_open(PyObject *self, PyObject *args, PyObject *kwds)
{
return PyObject_Call((PyObject*)&PyFile_Type, args, kwds);
}
file.__init__ would open the file
file.__enter__ indeed do nothing except doing empty check on field file.fp
file.__exit__ invoke close() method to close file
For Question 2
Why file design like this is due to a historical reason.
open and with are two different keywords introduced on different versions of CPython. with was introduced till Python 2.5 (see PEP 343). At that time, open has been used for a long time.
For our customized type, we could design like file or not, depends on the concrete application context.
For example, threading.Lock is a different design, its init and enter are separately.

Why can you use open() as context manager?

From Python's source code of open, I think open is just a normal function.
Why can we use it like below?
with open('what_are_context_managers.txt', 'r') as infile:
for line in infile:
print('> {}'.format(line))
Since is neither implements __enter__ nor __exit__, nor uses contextlib.contextmanager decorator.
You are not using the open function as a context manager. It is the result of the open(...) call expression that is the context manager. open() returns a file object, and it is that object that has __enter__ and __exit__ methods; see the io.IOBase documentation:
IOBase is also a context manager and therefore supports the with statement.
You can read the with statement like this:
_context_manager = open('what_are_context_managers.txt', 'r')
with _context_manager as infile:
Note that it is the return value of _context_manager.__enter__() that ends up being assigned to infile here. For file objects, file.__enter__() returns self, so you can get access to the same object that way.
As a side-note; you got the wrong open() function. The actual definition of the open() built-in is an alias for io.open(), see the _iomodule.c source code. The alias is set in initstdio() in pylifecycle.c (where io.OpenWrapper is itself an alias for _io.open). And yes, the documentation states the alias points the other way for end-user ease.

Python wmi parameters reversed

Using python's wmi module to create a vss snapshot, I've found that the parameters don't work unless I reverse them:
import wmi
def vss_create():
shadow_copy_service = wmi.WMI(moniker='winmgmts:\\\\.\\root\\cimv2:Win32_ShadowCopy')
res = shadow_copy_service.Create('ClientAccessible', 'C:\\')
In the msdn docs, the function is instead supposed to be used this way:
Win32_ShadowCopy.Create("C:\\", "ClientAccessible");
Why is this the case, and is there a way to use the intended order?
Summary
It looks like the parameter ordering for wmi object's methods is reversed from normal by the PyWin32 layer, and this behaviour has been present for at least five years. The relevant wmi spec says that a wmi client can pass the parameters in any order, so PyWin32 is not 'wrong' to do this, although I can't tell if it's deliberate or accident. I speculate that it's unlikely to change, for backwards compatibility reasons, but you can work around this and put the parameters in the order you want by specifying them as keyword parameters: Create(Volume=, Context=).
Details
NB. In the below details, I'm trying to go down in layers from the Python WMI module code to WMI objects accessed by COM in PyWin32 code, to WMI objects as documented and used in other languages, to WMI object specification by MOF files, to specification documents. There's several layers and I write "WMI" a lot, meaning different things at different layers.
When you say "Python's wmi module" do you mean Tim Golden's Python WMI module (link to source code) that builds on PyWin32?
When you get a Python WMI object from the wmi module, the initialization steps it goes through are inside the class _wmi_object, and include querying the underlying wmi object for its available methods:
for m in ole_object.Methods_:
self.methods[m.Name] = None
I'm going to skip beneath Python's wmi module, and use PyWin32 directly to look at what you get when querying a WMI COM object for its available methods:
>>> from win32com.client import GetObject
>>> vss = GetObject('winmgmts:\\\\.\\root\\cimv2:Win32_ShadowCopy')
>>> [method.Name for method in list(vss.Methods_)]
[u'Create', u'Revert']
And we see the Win32_ShadowCopy object has the methods Create and Revert. So that's where the Python wmi wrapper first learns about the Create method you are using.
From there, the Python WMI wrapper class does some setup work which I haven't traced through fully, but it seems to initialize class _wmi_method once for each available method of the COM object. This class includes these initialization steps:
self.method = ole_object.Methods_ (method_name)
self.in_parameter_names = [(i.Name, i.IsArray) for i in self.in_parameters.Properties_]
A list comprehension to get the available parameters for each method. Going back to my testing to explore that without the Python WMI layer, it gives output like this:
>>> CreateMethod = vss.Methods_('Create')
>>> [n.Name for n in list(CreateMethod.InParameters.Properties_)]
[u'Context', u'Volume']
This example test shows the PyWin32 later, the COM object for Win32_ShadowCopy, the Create method - lists its available parameters in the order you are seeing - the "wrong" order. The Python WMI layer is picking up that ordering.
When you call the Win32_ShadowCopy object's Create() method through Python WMI's wrapper, the _wmi_method does this:
def __call__ (self, *args, **kwargs):
for n_arg in range (len (args)):
arg = args[n_arg]
parameter = parameters.Properties_[n_arg]
parameter.Value = arg
In other words; it pairs up the parameters you pass in (*args) with the stored parameter list, one by one, taking the parameters in the order you pass them, and pairing them with the method parameters in the order WMI returned them - i.e. it's not intelligent, it just links the first parameter you enter with 'Context' and the second with 'Volume' and gets them backwards, and your code crashes.
The call method also includes Python's **kwargs parameter which takes all given keywords, suggesting you can do
Create(Volume='C:\\', Context="ClientAccessible")
and put them in the order you want by using them as keyword arguments. (I haven't tried).
I have tried tracing the .Properties_ lookup through PyWin32com to try and identify where the ordering comes from at the lower layers, and it goes through a long chain of dynamic and cached lookups. I can't see what happens and I don't understand enough COM or PyWin32 to know what kinds of things to be looking for, so that's a dead end for me.
Taking a different approach and trying to find out from the WMI object setup files where the ordering comes from: run mofcomp.exe which ships with Windows and processes Managed Object Format (MOF) files... click Connect, Create Class "Win32_ShadowCopy"; Click the "Create" method in the methods list, then click the "Edit Method" button; then click "Edit Input Arguments" then click "Show MOF", and get this result:
[abstract]
class __PARAMETERS
{
[in, ID(0): DisableOverride ToInstance] string Volume;
[in, ID(1): DisableOverride ToInstance] string Context = "ClientAccessible";
};
That's the "correct" order of the parameters coming out of the Windows MOF files, with numeric IDs for the parameters - implying they have a correct ordering 0, 1, etc.
c:\windows\system32\wbem\vss.mof, the MOF file which appears to cover the Volume Shadow Copy objects contains this:
[static,implemented,constructor] uint32 Create([in] string Volume,[in] string Context = "ClientAccessible",[out] string ShadowID);
and the PowerShell example in the comments at this MSDN link includes $class.create("C:\", "ClientAccessible").
So those three things all tie up with the same ordering and implies there is a correct or standard ordering.
That leaves me thinking of these possibilities:
There is ordering information coming out of PythonCOM and the wmi module should look at it, but doesn't. - I have looked around quickly, and can't find ID / ordering data with the parameters list, so that seems unlikely.
There is ordering information somewhere unknown to me which the PyWin32 COM layer should be looking at but doesn't. - Not sure here.
There is no official ordering. Trying to confirm this point, we get a fun chain:
What is WMI? Microsoft's implementation of standard management frameworks WBEM and CIM, specified by the DTMF. (DTMF = Distributed Management Task Force, WBEM is Web Based Enterprise Management and CIM is the Common Information Model).
MOF is the Managed Object Format, a text representation of CIM
This document: http://www.dmtf.org/sites/default/files/standards/documents/DSP0221_3.0.0.pdf appears to be the MOF specification. Check section 7.3.3 Class Declaration, from page 18:
line 570:
"A method can have zero or more parameters".
lines 626 through 628:
Method parameters are identified by name and not by position and
clients invoking a method can pass the corresponding arguments in
any order. Therefore parameters with default values can be added to
the method signature at any position.
I don't know for sure if that's an authoritative and current specification, nor have I read all of it looking for exceptions, but it sounds like you should use named parameters.
The WMI objects and methods have a MOF definition, and the MOF specification says you shouldn't rely on the parameter ordering; however, accessing the WMI objects via COM via PyWin32 is showing a different ordering to (MSDN docs, the MOF file and the PowerShell examples). I still don't know why.
And Googling that leads me to this mailing list post by Tim Golden, author of the Python wmi module, saying basically the same thing as I've just found, except five years ago:
the method definition picks up the parameters in the order in which WMI returns them [..]
I've got no idea if there is any guarantee about the order of parameters [..]
Glancing at a few other method definitions, it does seem as though WMI is consistently returning params in the reverse order of their definition in the MOF.
At this point, it looks like PyWin32 is returning a reversed list to the typical Windows parameter order, but is that a bug if the CIM managed object method parameter list specification document explicitly says don't rely on the parameter ordering?

How do you check if an object is an instance of 'file'?

It used to be in Python (2.6) that one could ask:
isinstance(f, file)
but in Python 3.0 file was removed.
What is the proper method for checking to see if a variable is a file now? The What'sNew docs don't mention this...
def read_a_file(f)
try:
contents = f.read()
except AttributeError:
# f is not a file
substitute whatever methods you plan to use for read. This is optimal if you expect that you will get passed a file like object more than 98% of the time. If you expect that you will be passed a non file like object more often than 2% of the time, then the correct thing to do is:
def read_a_file(f):
if hasattr(f, 'read'):
contents = f.read()
else:
# f is not a file
This is exactly what you would do if you did have access to a file class to test against. (and FWIW, I too have file on 2.6) Note that this code works in 3.x as well.
In python3 you could refer to io instead of file and write
import io
isinstance(f, io.IOBase)
Typically, you don't need to check an object type, you could use duck-typing instead i.e., just call f.read() directly and allow the possible exceptions to propagate -- it is either a bug in your code or a bug in the caller code e.g., json.load() raises AttributeError if you give it an object that has no read attribute.
If you need to distinguish between several acceptable input types; you could use hasattr/getattr:
def read(file_or_filename):
readfile = getattr(file_or_filename, 'read', None)
if readfile is not None: # got file
return readfile()
with open(file_or_filename) as file: # got filename
return file.read()
If you want to support a case when file_of_filename may have read attribute that is set to None then you could use try/except over file_or_filename.read -- note: no parens, the call is not made -- e.g., ElementTree._get_writer().
If you want to check certain guarantees e.g., that only one single system call is made (io.RawIOBase.read(n) for n > 0) or there are no short writes (io.BufferedIOBase.write()) or whether read/write methods accept text data (io.TextIOBase) then you could use isinstance() function with ABCs defined in io module e.g., look at how saxutils._gettextwriter() is implemented.
Works for me on python 2.6... Are you in a strange environment where builtins aren't imported by default, or where somebody has done del file, or something?

Categories