what is the difference between os.open and os.fdopen in python - python

I am really confused when to use os.open and when to use os.fdopen
I was doing all my work with os.open and it worked without any problem but I am not able to understand under what conditions we need file descriptors and all other functions like dup and fsync
Is the file object different from file descriptor
i mean f = os.open("file.txt",w)
Now is f the fileobject or its the filedescriptor?

You are confusing the built-in open() function with os.open() provided by the os module. They are quite different; os.open(filename, "w") is not valid Python (os.open accepts integer flags as its second argument), open(filename, "w") is.
In short, open() creates new file objects, os.open() creates OS-level file descriptors, and os.fdopen() creates a file object out of a file descriptor.
File descriptors are a low-level facility for working with files directly provided by the operating system kernel. A file descriptor is a small integer that identifies the open file in a table of open files kept by the kernel for each process. A number of system calls accept file descriptors, but they are not convenient to work with, typically requiring fixed-width buffers, multiple retries in certain conditions, and manual error handling.
File objects are Python classes that wrap file descriptors to make working with files more convenient and less error-prone. They provide, for example, error-handling, buffering, line-by-line reading, charset conversions, and are closed when garbage collected.
To recapitulate:
Built-in open() takes a file name and returns a new Python file object. This is what you need in the majority of cases.
os.open() takes a file name and returns a new file descriptor. This file descriptor can be passed to other low-level functions, such as os.read() and os.write(), or to os.fdopen(), as described below. You only need this when writing code that depends on operating-system-dependent APIs, such as using the O_EXCL flag to open(2).
os.fdopen() takes an existing file descriptor — typically produced by Unix system calls such as pipe() or dup(), and builds a Python file object around it. Effectively it converts a file descriptor to a full file object, which is useful when interfacing with C code or with APIs that only create low-level file descriptors.
Built-in open can be emulated by combining os.open() (to create a file descriptor) and os.fdopen() (to wrap it in a file object):
# functionally equivalent to open(filename, "r")
f = os.fdopen(os.open(filename, os.O_RDONLY))

Related

Python get file path from a file descriptor int (as returned from os.open)

I am using fusepy and I need to convert a file descriptor back in to a file object so that I can obtain the original file path
From the fusepy examples, when a file is created, a file descriptor is returned - for example:
def open(self, path, flags):
print("open:", path)
return os.open(path, flags)
the returned result is an integer: <class 'int'> with the value of 4
in a separate function named write, I need to reverse the file descriptor back in to a file so that I can get the file path, so I tried this:
f = os.fdopen(fh)
When I check the type of f I get the following f is type: <class '_io.TextIOWrapper'>
Which is not quite what I was expecting but a quick dir(f) shows that it has a name property, I thought that's what I was looking for, except name is simply the number 4...
How can I get the original file path the descriptor points to?
Since this seems to satisfy the need, I'll post it as an answer for the time being. One could access underlying objects through f.buffer and f.buffer.raw (not the surprise mentioned in question depends a bit on Python version used, in v2 this looked different), but that still won't help accessing the filename used top open the file. Note: this could have just as well taken place in a calling process and a descriptor could have been inherited by the python process.
Not sure if there is a nicer and more portable way, but one can query OS and on U*X like system a readily available way would be to reference procfs structure, namely for the above example:
os.readlink(f"/proc/self/fd/{fh}")
Still not an entirely trivial question. Descriptor may still be open and used, while the underlying file(name; filesystem reference) has already been deleted.

Which python module contains file object methods?

While it is simple to search by using help for most methods that have a clear help(module.method) arrangement, for example help(list.extend), I cannot work out how to look up the method .readline() in python's inbuilt help function.
Which module does .readline belong to? How would I search in help for .readline and related methods?
Furthermore is there any way I can use the interpreter to find out which module a method belongs to in future?
Don't try to find the module. Make an instance of the class you want, then call help on the method of that instance, and it will find the correct help info for you. Example:
>>> f = open('pathtosomefile')
>>> help(f.readline)
Help on built-in function readline:
readline(size=-1, /) method of _io.TextIOWrapper instance
Read until newline or EOF.
Returns an empty string if EOF is hit immediately.
In my case (Python 3.7.1), it's defined on the type _io.TextIOWrapper (exposed publicly as io.TextIOWrapper, but help doesn't know that), but memorizing that sort of thing isn't very helpful. Knowing how to figure it out by introspecting the specific thing you care about is much more broadly applicable. In this particular case, it's extra important not to try guessing, because the open function can return a few different classes, each with different methods, depending on the arguments provided, including io.BufferedReader, io.BufferedWriter, io.BufferedRandom, and io.FileIO, each with their own version of the readline method (though they all share a similar interface for consistency's sake).
From the text of help(open):
open() returns a file object whose type depends on the mode, and
through which the standard file operations such as reading and writing
are performed. When open() is used to open a file in a text mode ('w',
'r', 'wt', 'rt', etc.), it returns a TextIOWrapper. When used to open
a file in a binary mode, the returned class varies: in read binary
mode, it returns a BufferedReader; in write binary and append binary
modes, it returns a BufferedWriter, and in read/write mode, it returns
a BufferedRandom.
See also the section of python's io module documentation on the class hierarchy.
So you're looking at TextIOWrapper, BufferedReader, BufferedWriter, or BufferedRandom. These all have their own sets of class hierarchies, but suffice it to say that they share the IOBase superclass at some point - that's where the functions readline() and readlines() are declared. Of course, each subclass implements these functions differently for its particular mode - if you do
help(_io.TextIOWrapper.readline)
you should get the documentation you're looking for.
In particular, you're having trouble accessing the documentation for whichever version of readline you need, because you can't be bothered to figure out which class it is. You can actually call help on an object as well. If you're working with a particular file object, then you can spin up a terminal, instantiate it, and then just pass it to help() and it'll show you whatever interface is closest to the surface. Example:
x = open('some_file.txt', 'r')
help(x.readline)

How does open handle context management?

The python built-ins open, and file work with context managers in a way that I don't quite understand.
It is to my understanding that open will create a file. file implements the context-manager methods __enter__ and __exit__. I would initially expect __enter__ to implement the actual opening of the file descriptor.
However, using open outside of a with block will return a file which is already open. So, it appears either file.__init__ or open is actually opening the file descriptor, and as far as I can tell file.__enter__ isn't doing anything. Or maybe file.__init__/open calls file.__enter__ directly?
First question:
What is the execution-flow of the open built-in? What does open handle, what does file.__init__ handle, and what does file.__enter__ handle? How does this work when re-using one file object for multiple cycles of opening/closing the file? How is this different from re-using other contextmanager objects for multiple context-cycles?
Second question:
Objects such as file objects have a setup steps and teardown steps. The setup occurs in __init__ , and the tear-down occurs in either close or __exit__.
Is this a good design pattern? Should this design pattern be implemented for custom functions/context managers?
If you look in _pyio.py (a pure-Python implementation of the io module) you find the following code in class IOBase:
### Context manager ###
def __enter__(self): # That's a forward reference
"""Context management protocol. Returns self (an instance of IOBase)."""
self._checkClosed()
return self
def __exit__(self, *args):
"""Context management protocol. Calls close()"""
self.close()
This contains the answers to most of your questions. The important thing to understand is that the context manager's function is to insure that you close the file when you are done with it. It does this simply by calling the close function, which saves you the trouble of doing so.
What does file.__enter__ handle? Nothing. It simply returns to you the file object that was the result of the call to the built-in function open().
How does this work when using one file object for multiple cycles of opening and closing the file? The context manager is not very useful for that purpose, since you must explicitly open the file each time.
Is this a good design pattern? Yes, because it reduces the amount of code you have to write, it's easy to read and understand.
Should this pattern be implemented for custom functions/context managers? Any time you have an object that needs to be cleaned up, or has usage that involves some type of open/close concept, you should consider this pattern. The standard library has many other examples.
For Question 1
In CPython, open() does nothing but creating a file object, which the underlying C type is PyFileObject; See source code in bltinmodule.c and fileobject.c
static PyObject *
builtin_open(PyObject *self, PyObject *args, PyObject *kwds)
{
return PyObject_Call((PyObject*)&PyFile_Type, args, kwds);
}
file.__init__ would open the file
file.__enter__ indeed do nothing except doing empty check on field file.fp
file.__exit__ invoke close() method to close file
For Question 2
Why file design like this is due to a historical reason.
open and with are two different keywords introduced on different versions of CPython. with was introduced till Python 2.5 (see PEP 343). At that time, open has been used for a long time.
For our customized type, we could design like file or not, depends on the concrete application context.
For example, threading.Lock is a different design, its init and enter are separately.

What's the difference between io.open() and os.open() on Python?

I realised that the open() function I've been using was an alias to io.open() and that importing * from os would overshadow that.
What's the difference between opening files through the io module and os module?
io.open() is the preferred, higher-level interface to file I/O. It wraps the OS-level file descriptor in an object that you can use to access the file in a Pythonic manner.
os.open() is just a wrapper for the lower-level POSIX syscall. It takes less symbolic (and more POSIX-y) arguments, and returns the file descriptor (a number) that represents the opened file. It does not return a file object; the returned value will not have read() or write() methods.
From the os.open() documentation:
This function is intended for low-level I/O. For normal usage, use the built-in function open(), which returns a “file object” with read() and write() methods (and many more).
Absolutely everything:
os.open() takes a filename as a string, the file mode as a bitwise mask of attributes, and an optional argument that describes the file permission bits, and returns a file descriptor as an integer.
io.open() takes a filename as a string or a file descriptor as an integer, the file mode as a string, and optional arguments that describe the file encoding, buffering used, how encoding errors and newlines are handled, and if the underlying FD is closed when the file is closed, and returns some descendant of io.IOBase.
os.open is very similar to open() from C in Unix. You're unlikely to want to use it unless you're doing something much more low-level. It gives you an actual file descriptor (as in, a number, not an object).
io.open is your basic Python open() and what you want to use just about all the time.
To add to the existing answers:
I realised that the open() function I've been using was an alias to io.open()
open() == io.open() in Python 3 only. In Python 2 they are different.
While with open() in Python we can obtain an easy-to-use file object with handy read() and write() methods, on the OS level files are accessed using file descriptors (or file handles in Windows). Thus, os.open() should be used implicitly under the hood. I haven't examined Python source code in this regard, but the documentation for the opener parameter, which was added for open() in Python 3.3, says:
A custom opener can be used by passing a callable as opener. The
underlying file descriptor for the file object is then obtained by
calling opener with (file, flags). opener must return an open file
descriptor (passing os.open as opener results in functionality similar
to passing None).
So os.open() is the default opener for open(), and we also have the ability to specify a custom wrapper around it if file flags or mode need to be changed. See the documentation for open() for an example of a custom opener, which opens a file relative to a given directory.
In Python 2, the built-in open and io.open were different (io.open was newer and supported more things). In Python 3, open and io.open are now the same thing (they got rid of the old built-in open), so you should always use open. Code that needs to be compatible with Python 2 and 3 might have a reason to use io.open.
Below code to validate this.
import io
with io.open("../input/files.txt") as f:
text = f.read().lower()
with open('../input/files.txt', encoding='utf-8') as f2:
text2 = f2.read().lower()
print(type(f))
print(type(f2))
# <class '_io.TextIOWrapper'>
# <class '_io.TextIOWrapper'>
Database and system application developers usually use open instead of fopen as the former provides finer control on when, what and how the memory content should be written to its backing store (i.e., file on disk).
In Unix-like operating system, open is used to open regular file, socket end-point, device, pipe, etc. A positive file descriptor number is returned for every successful open function call. It provides a consistent API and framework to check for event notification, etc on a variety of these objects.
However, fopen is a standard C function and is normally used to open regular file and return a FILE data structure. fopen, actually, will call open eventually. fopen is good enough for normal usage as developers do not need to worry when to flush or sync memory content to the disk and do not need event notification.
os.open() method opens the file file and set various flags according to flags and possibly its mode according to mode.
The default mode is 0777 (octal), and the current unmask value is first masked out.
This method returns the file descriptor for the newly opened file.
While,
io.open() method opens a file, in the mode specified in the string mode. It returns a new file handle, or, in case of errors, nil plus an error message.
Hope this helps

Design of a python pickleable object that describes a file

I would like to create a class that describes a file resource and then pickle it. This part is straightforward. To be concrete, let's say that I have a class "A" that has methods to operate on a file. I can pickle this object if it does not contain a file handle. I want to be able to create a file handle in order to access the resource described by "A". If I have an "open()" method in class "A" that opens and stores the file handle for later use, then "A" is no longer pickleable. (I add here that opening the file includes some non-trivial indexing which cannot be cached--third party code--so closing and reopening when needed is not without expense). I could code class "A" as a factory that can generate file handles to the described file, but that could result in multiple file handles accessing the file contents simultaneously. I could use another class "B" to handle the opening of the file in class "A", including locking, etc. I am probably overthinking this, but any hints would be appreciated.
The question isn't too clear; what it looks like is that:
you have a third-party module which has picklable classes
those classes may contain references to files, which makes the classes themselves not picklable because open files aren't picklable.
Essentially, you want to make open files picklable. You can do this fairly easily, with certain caveats. Here's an incomplete but functional sample:
import pickle
class PicklableFile(object):
def __init__(self, fileobj):
self.fileobj = fileobj
def __getattr__(self, key):
return getattr(self.fileobj, key)
def __getstate__(self):
ret = self.__dict__.copy()
ret['_file_name'] = self.fileobj.name
ret['_file_mode'] = self.fileobj.mode
ret['_file_pos'] = self.fileobj.tell()
del ret['fileobj']
return ret
def __setstate__(self, dict):
self.fileobj = open(dict['_file_name'], dict['_file_mode'])
self.fileobj.seek(dict['_file_pos'])
del dict['_file_name']
del dict['_file_mode']
del dict['_file_pos']
self.__dict__.update(dict)
f = PicklableFile(open("/tmp/blah"))
print f.readline()
data = pickle.dumps(f)
f2 = pickle.loads(data)
print f2.read()
Caveats and notes, some obvious, some less so:
This class should operate directly on the file object you got from open. If you're using wrapper classes on files, like gzip.GzipFile, those should go above this, not below it. Logically, treat this as a decorator class on top of file.
If the file doesn't exist when you unpickle, it can't be unpickled and will throw an exception.
If it's a different file, the behavior may or may not make sense.
If the file mode includes file creation ('w+'), and the file doesn't exist, it'll be created; we don't know what file permissions to use, since that's not stored with the file. If this is important--it probably shouldn't be--then store the correct permissions in the class when you first create it.
If the file isn't seekable, trying to seek to the old position may raise IOError; if you're using a file like that you'll need to decide how to handle that.
The file classes in Python 2 and Python 3 are different; there's no file class in Python 3. Even if you're only using Python 2 right now, don't subclass file.
I'd steer away from doing this; having pickled data dependent on external files not changing and staying in the same place is brittle. This makes it difficult to even relocate files, since your pickled data won't make sense.
If you open a pointer to a file, pickle it, then attempt to reconstitute is later, there is no guarantee that file will still be available for opening.
To elaborate, the file pointer really represents a connection to the file. Just like a database connection, you can't "pickle" the other end of the connection, so this won't work.
Is it possible to keep the file pointer around in memory in its own process instead?
It sounds like you know you can't pickle the handle, and you're ok with that, you just want to pickle the part that can be pickled. As your object stands now, it can't be pickled because it has the handle. Do I have that right? If so, read on.
The pickle module will let your class describe its own state to pickle, for exactly these cases. You want to define your own __getstate__ method. The pickler will invoke it to get the state to be pickled, only if the method is missing does it go ahead and do the default thing of trying to pickle all the attributes.

Categories