I realised that the open() function I've been using was an alias to io.open() and that importing * from os would overshadow that.
What's the difference between opening files through the io module and os module?
io.open() is the preferred, higher-level interface to file I/O. It wraps the OS-level file descriptor in an object that you can use to access the file in a Pythonic manner.
os.open() is just a wrapper for the lower-level POSIX syscall. It takes less symbolic (and more POSIX-y) arguments, and returns the file descriptor (a number) that represents the opened file. It does not return a file object; the returned value will not have read() or write() methods.
From the os.open() documentation:
This function is intended for low-level I/O. For normal usage, use the built-in function open(), which returns a “file object” with read() and write() methods (and many more).
Absolutely everything:
os.open() takes a filename as a string, the file mode as a bitwise mask of attributes, and an optional argument that describes the file permission bits, and returns a file descriptor as an integer.
io.open() takes a filename as a string or a file descriptor as an integer, the file mode as a string, and optional arguments that describe the file encoding, buffering used, how encoding errors and newlines are handled, and if the underlying FD is closed when the file is closed, and returns some descendant of io.IOBase.
os.open is very similar to open() from C in Unix. You're unlikely to want to use it unless you're doing something much more low-level. It gives you an actual file descriptor (as in, a number, not an object).
io.open is your basic Python open() and what you want to use just about all the time.
To add to the existing answers:
I realised that the open() function I've been using was an alias to io.open()
open() == io.open() in Python 3 only. In Python 2 they are different.
While with open() in Python we can obtain an easy-to-use file object with handy read() and write() methods, on the OS level files are accessed using file descriptors (or file handles in Windows). Thus, os.open() should be used implicitly under the hood. I haven't examined Python source code in this regard, but the documentation for the opener parameter, which was added for open() in Python 3.3, says:
A custom opener can be used by passing a callable as opener. The
underlying file descriptor for the file object is then obtained by
calling opener with (file, flags). opener must return an open file
descriptor (passing os.open as opener results in functionality similar
to passing None).
So os.open() is the default opener for open(), and we also have the ability to specify a custom wrapper around it if file flags or mode need to be changed. See the documentation for open() for an example of a custom opener, which opens a file relative to a given directory.
In Python 2, the built-in open and io.open were different (io.open was newer and supported more things). In Python 3, open and io.open are now the same thing (they got rid of the old built-in open), so you should always use open. Code that needs to be compatible with Python 2 and 3 might have a reason to use io.open.
Below code to validate this.
import io
with io.open("../input/files.txt") as f:
text = f.read().lower()
with open('../input/files.txt', encoding='utf-8') as f2:
text2 = f2.read().lower()
print(type(f))
print(type(f2))
# <class '_io.TextIOWrapper'>
# <class '_io.TextIOWrapper'>
Database and system application developers usually use open instead of fopen as the former provides finer control on when, what and how the memory content should be written to its backing store (i.e., file on disk).
In Unix-like operating system, open is used to open regular file, socket end-point, device, pipe, etc. A positive file descriptor number is returned for every successful open function call. It provides a consistent API and framework to check for event notification, etc on a variety of these objects.
However, fopen is a standard C function and is normally used to open regular file and return a FILE data structure. fopen, actually, will call open eventually. fopen is good enough for normal usage as developers do not need to worry when to flush or sync memory content to the disk and do not need event notification.
os.open() method opens the file file and set various flags according to flags and possibly its mode according to mode.
The default mode is 0777 (octal), and the current unmask value is first masked out.
This method returns the file descriptor for the newly opened file.
While,
io.open() method opens a file, in the mode specified in the string mode. It returns a new file handle, or, in case of errors, nil plus an error message.
Hope this helps
Related
I am using fusepy and I need to convert a file descriptor back in to a file object so that I can obtain the original file path
From the fusepy examples, when a file is created, a file descriptor is returned - for example:
def open(self, path, flags):
print("open:", path)
return os.open(path, flags)
the returned result is an integer: <class 'int'> with the value of 4
in a separate function named write, I need to reverse the file descriptor back in to a file so that I can get the file path, so I tried this:
f = os.fdopen(fh)
When I check the type of f I get the following f is type: <class '_io.TextIOWrapper'>
Which is not quite what I was expecting but a quick dir(f) shows that it has a name property, I thought that's what I was looking for, except name is simply the number 4...
How can I get the original file path the descriptor points to?
Since this seems to satisfy the need, I'll post it as an answer for the time being. One could access underlying objects through f.buffer and f.buffer.raw (not the surprise mentioned in question depends a bit on Python version used, in v2 this looked different), but that still won't help accessing the filename used top open the file. Note: this could have just as well taken place in a calling process and a descriptor could have been inherited by the python process.
Not sure if there is a nicer and more portable way, but one can query OS and on U*X like system a readily available way would be to reference procfs structure, namely for the above example:
os.readlink(f"/proc/self/fd/{fh}")
Still not an entirely trivial question. Descriptor may still be open and used, while the underlying file(name; filesystem reference) has already been deleted.
While it is simple to search by using help for most methods that have a clear help(module.method) arrangement, for example help(list.extend), I cannot work out how to look up the method .readline() in python's inbuilt help function.
Which module does .readline belong to? How would I search in help for .readline and related methods?
Furthermore is there any way I can use the interpreter to find out which module a method belongs to in future?
Don't try to find the module. Make an instance of the class you want, then call help on the method of that instance, and it will find the correct help info for you. Example:
>>> f = open('pathtosomefile')
>>> help(f.readline)
Help on built-in function readline:
readline(size=-1, /) method of _io.TextIOWrapper instance
Read until newline or EOF.
Returns an empty string if EOF is hit immediately.
In my case (Python 3.7.1), it's defined on the type _io.TextIOWrapper (exposed publicly as io.TextIOWrapper, but help doesn't know that), but memorizing that sort of thing isn't very helpful. Knowing how to figure it out by introspecting the specific thing you care about is much more broadly applicable. In this particular case, it's extra important not to try guessing, because the open function can return a few different classes, each with different methods, depending on the arguments provided, including io.BufferedReader, io.BufferedWriter, io.BufferedRandom, and io.FileIO, each with their own version of the readline method (though they all share a similar interface for consistency's sake).
From the text of help(open):
open() returns a file object whose type depends on the mode, and
through which the standard file operations such as reading and writing
are performed. When open() is used to open a file in a text mode ('w',
'r', 'wt', 'rt', etc.), it returns a TextIOWrapper. When used to open
a file in a binary mode, the returned class varies: in read binary
mode, it returns a BufferedReader; in write binary and append binary
modes, it returns a BufferedWriter, and in read/write mode, it returns
a BufferedRandom.
See also the section of python's io module documentation on the class hierarchy.
So you're looking at TextIOWrapper, BufferedReader, BufferedWriter, or BufferedRandom. These all have their own sets of class hierarchies, but suffice it to say that they share the IOBase superclass at some point - that's where the functions readline() and readlines() are declared. Of course, each subclass implements these functions differently for its particular mode - if you do
help(_io.TextIOWrapper.readline)
you should get the documentation you're looking for.
In particular, you're having trouble accessing the documentation for whichever version of readline you need, because you can't be bothered to figure out which class it is. You can actually call help on an object as well. If you're working with a particular file object, then you can spin up a terminal, instantiate it, and then just pass it to help() and it'll show you whatever interface is closest to the surface. Example:
x = open('some_file.txt', 'r')
help(x.readline)
The python built-ins open, and file work with context managers in a way that I don't quite understand.
It is to my understanding that open will create a file. file implements the context-manager methods __enter__ and __exit__. I would initially expect __enter__ to implement the actual opening of the file descriptor.
However, using open outside of a with block will return a file which is already open. So, it appears either file.__init__ or open is actually opening the file descriptor, and as far as I can tell file.__enter__ isn't doing anything. Or maybe file.__init__/open calls file.__enter__ directly?
First question:
What is the execution-flow of the open built-in? What does open handle, what does file.__init__ handle, and what does file.__enter__ handle? How does this work when re-using one file object for multiple cycles of opening/closing the file? How is this different from re-using other contextmanager objects for multiple context-cycles?
Second question:
Objects such as file objects have a setup steps and teardown steps. The setup occurs in __init__ , and the tear-down occurs in either close or __exit__.
Is this a good design pattern? Should this design pattern be implemented for custom functions/context managers?
If you look in _pyio.py (a pure-Python implementation of the io module) you find the following code in class IOBase:
### Context manager ###
def __enter__(self): # That's a forward reference
"""Context management protocol. Returns self (an instance of IOBase)."""
self._checkClosed()
return self
def __exit__(self, *args):
"""Context management protocol. Calls close()"""
self.close()
This contains the answers to most of your questions. The important thing to understand is that the context manager's function is to insure that you close the file when you are done with it. It does this simply by calling the close function, which saves you the trouble of doing so.
What does file.__enter__ handle? Nothing. It simply returns to you the file object that was the result of the call to the built-in function open().
How does this work when using one file object for multiple cycles of opening and closing the file? The context manager is not very useful for that purpose, since you must explicitly open the file each time.
Is this a good design pattern? Yes, because it reduces the amount of code you have to write, it's easy to read and understand.
Should this pattern be implemented for custom functions/context managers? Any time you have an object that needs to be cleaned up, or has usage that involves some type of open/close concept, you should consider this pattern. The standard library has many other examples.
For Question 1
In CPython, open() does nothing but creating a file object, which the underlying C type is PyFileObject; See source code in bltinmodule.c and fileobject.c
static PyObject *
builtin_open(PyObject *self, PyObject *args, PyObject *kwds)
{
return PyObject_Call((PyObject*)&PyFile_Type, args, kwds);
}
file.__init__ would open the file
file.__enter__ indeed do nothing except doing empty check on field file.fp
file.__exit__ invoke close() method to close file
For Question 2
Why file design like this is due to a historical reason.
open and with are two different keywords introduced on different versions of CPython. with was introduced till Python 2.5 (see PEP 343). At that time, open has been used for a long time.
For our customized type, we could design like file or not, depends on the concrete application context.
For example, threading.Lock is a different design, its init and enter are separately.
I am really confused when to use os.open and when to use os.fdopen
I was doing all my work with os.open and it worked without any problem but I am not able to understand under what conditions we need file descriptors and all other functions like dup and fsync
Is the file object different from file descriptor
i mean f = os.open("file.txt",w)
Now is f the fileobject or its the filedescriptor?
You are confusing the built-in open() function with os.open() provided by the os module. They are quite different; os.open(filename, "w") is not valid Python (os.open accepts integer flags as its second argument), open(filename, "w") is.
In short, open() creates new file objects, os.open() creates OS-level file descriptors, and os.fdopen() creates a file object out of a file descriptor.
File descriptors are a low-level facility for working with files directly provided by the operating system kernel. A file descriptor is a small integer that identifies the open file in a table of open files kept by the kernel for each process. A number of system calls accept file descriptors, but they are not convenient to work with, typically requiring fixed-width buffers, multiple retries in certain conditions, and manual error handling.
File objects are Python classes that wrap file descriptors to make working with files more convenient and less error-prone. They provide, for example, error-handling, buffering, line-by-line reading, charset conversions, and are closed when garbage collected.
To recapitulate:
Built-in open() takes a file name and returns a new Python file object. This is what you need in the majority of cases.
os.open() takes a file name and returns a new file descriptor. This file descriptor can be passed to other low-level functions, such as os.read() and os.write(), or to os.fdopen(), as described below. You only need this when writing code that depends on operating-system-dependent APIs, such as using the O_EXCL flag to open(2).
os.fdopen() takes an existing file descriptor — typically produced by Unix system calls such as pipe() or dup(), and builds a Python file object around it. Effectively it converts a file descriptor to a full file object, which is useful when interfacing with C code or with APIs that only create low-level file descriptors.
Built-in open can be emulated by combining os.open() (to create a file descriptor) and os.fdopen() (to wrap it in a file object):
# functionally equivalent to open(filename, "r")
f = os.fdopen(os.open(filename, os.O_RDONLY))
Right, simple question here. I couldn't find an answer using google and looking here.
I need to write a few stubs for builtin functions within python like open(name[, mode[, buffering]]). However I can't seem to find the default values for mode and buffering
It does not seem to be None
For making wrappers for the built-ins, what you usually end up doing is something like:
def myOpen(name, mode='r', buffer=None):
if buffer:
open_file = open(name, mode, buffer)
else:
open_file = open(name, mode)
The reason is that not all the arguments are accesible via keyword (buffer in this case).
See the documentation of open():
If mode is omitted, it defaults to 'r'.
...
The optional buffering argument specifies the file’s desired buffer size (...) If omitted, the system default is used.
When it comes to the system default for buffering:
Specifying a buffer size currently has no effect on systems that don’t have setvbuf(). The interface to specify the buffer size is not done using a method that calls setvbuf(), because that may dump core when called after any I/O has been performed, and there’s no reliable way to determine whether this is the case.