Which python module contains file object methods? - python

While it is simple to search by using help for most methods that have a clear help(module.method) arrangement, for example help(list.extend), I cannot work out how to look up the method .readline() in python's inbuilt help function.
Which module does .readline belong to? How would I search in help for .readline and related methods?
Furthermore is there any way I can use the interpreter to find out which module a method belongs to in future?

Don't try to find the module. Make an instance of the class you want, then call help on the method of that instance, and it will find the correct help info for you. Example:
>>> f = open('pathtosomefile')
>>> help(f.readline)
Help on built-in function readline:
readline(size=-1, /) method of _io.TextIOWrapper instance
Read until newline or EOF.
Returns an empty string if EOF is hit immediately.
In my case (Python 3.7.1), it's defined on the type _io.TextIOWrapper (exposed publicly as io.TextIOWrapper, but help doesn't know that), but memorizing that sort of thing isn't very helpful. Knowing how to figure it out by introspecting the specific thing you care about is much more broadly applicable. In this particular case, it's extra important not to try guessing, because the open function can return a few different classes, each with different methods, depending on the arguments provided, including io.BufferedReader, io.BufferedWriter, io.BufferedRandom, and io.FileIO, each with their own version of the readline method (though they all share a similar interface for consistency's sake).

From the text of help(open):
open() returns a file object whose type depends on the mode, and
through which the standard file operations such as reading and writing
are performed. When open() is used to open a file in a text mode ('w',
'r', 'wt', 'rt', etc.), it returns a TextIOWrapper. When used to open
a file in a binary mode, the returned class varies: in read binary
mode, it returns a BufferedReader; in write binary and append binary
modes, it returns a BufferedWriter, and in read/write mode, it returns
a BufferedRandom.
See also the section of python's io module documentation on the class hierarchy.
So you're looking at TextIOWrapper, BufferedReader, BufferedWriter, or BufferedRandom. These all have their own sets of class hierarchies, but suffice it to say that they share the IOBase superclass at some point - that's where the functions readline() and readlines() are declared. Of course, each subclass implements these functions differently for its particular mode - if you do
help(_io.TextIOWrapper.readline)
you should get the documentation you're looking for.
In particular, you're having trouble accessing the documentation for whichever version of readline you need, because you can't be bothered to figure out which class it is. You can actually call help on an object as well. If you're working with a particular file object, then you can spin up a terminal, instantiate it, and then just pass it to help() and it'll show you whatever interface is closest to the surface. Example:
x = open('some_file.txt', 'r')
help(x.readline)

Related

Is it possible to uniformly save any object in a JSON file?

I'm working on a web-server type of application and as part of multi-language communication I need to serialize objects in a JSON file. The issue is that I'd like to create a function which can take any custom object and save it at run time rather than limiting the function to what type of objects it can store based on structure.
Apologies if this question is a duplicate, however from what I have searched the other questions and answers do not seem to tackle the dynamic structure aspect of the problem, thus leading me to open this question.
The function is going to be used to communicate between PHP server code and Python scripts, hence the need for such a solution
I have attempted to use the json.dump(data,outfile) function, however the issue is that I need to convert such objects to a legal data structure first
JSON is a rigidly structured format, and Python's json module, by design, won't try to coerce types it doesn't understand.
Check out this SO answer. While __dict__ might work in some cases, it's often not exactly what you want. One option is to write one or more classes that inherit JSONEncoder and provides a method that turns your type or types into basic types that json.dump can understand.
Another option would be to write a parent class, e.g. JSONSerializable and have these data types inherit it the way you'd use an interface in some other languages. Making it an abstract base class would make sense, but I doubt that's important to your situation. Define a method on your base class, e.g. def dictify(self), and either implement it if it makes sense to have a default behavior or just have it it raise NotImplementedError.
Note that I'm not calling the method serialize, because actual serialization will be handled by json.dump.
class JSONSerializable(ABC):
def dictify(self):
raise NotImplementedError("Missing serialization implementation!")
class YourDataType(JSONSerializable):
def __init__(self):
self.something = None
# etc etc
def dictify(self):
return {"something": self.something}
class YourIncompleteDataType(JSONSerializable):
# No dictify(self) implementation
pass
Example usage:
>>> valid = YourDataType()
>>> valid.something = "really something"
>>> valid.dictify()
{'something': 'really something'}
>>>
>>> invalid = YourIncompleteDataType()
>>> invalid.dictify()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in dictify
NotImplementedError: Missing dictify implementation!
Basically, though: You do need to handle this yourself, possibly on a per-type basis, depending on how different your types are. It's just a matter of what method of formatting your types for serialization is the best for your use case.

Python __doc__ documentation on instances

I'd like to provide documentation (within my program) on certain dynamically created objects, but still fall back to using their class documentation. Setting __doc__ seems a suitable way to do so. However, I can't find many details in the Python help in this regard, are there any technical problems with providing documentation on an instance? For example:
class MyClass:
"""
A description of the class goes here.
"""
a = MyClass()
a.__doc__ = "A description of the object"
print( MyClass.__doc__ )
print( a.__doc__ )
__doc__ is documented as a writable attribute for functions, but not for instances of user defined classes. pydoc.help(a), for example, will only consider the __doc__ defined on the type in Python versions < 3.9.
Other protocols (including future use-cases) may reasonably bypass the special attributes defined in the instance dict, too. See Special method lookup section of the datamodel documentation, specifically:
For custom classes, implicit invocations of special methods are only guaranteed to work correctly if defined on an object’s type, not in the object’s instance dictionary.
So, depending on the consumer of the attribute, what you intend to do may not be reliable. Avoid.
A safe and simple alternative is just to use a different attribute name of your own choosing for your own use-case, preferably not using the __dunder__ syntax convention which usually indicates a special name reserved for some specific use by the implementation and/or the stdlib.
There are some pretty obvious technical problems; the question is whether or not they matter for your use case.
Here are some major uses for docstrings that your idiom will not help with:
help(a): Type help(a) in an interactive terminal, and you get the docstring for MyClass, not the docstring for a.
Auto-generated documentation: Unless you write your own documentation generator, it's not going to understand that you've done anything special with your a value. Many doc generators do have some way to specify help for module and class constants, but I'm not aware of any that will recognize your idiom.
IDE help: Many IDEs will not only auto-complete an expression, but show the relevant docstring in a tooltip. They all do this statically, and without some special-case code designed around your idiom (which they're unlikely to have, given that it's an unusual idiom), they're almost certain to fetch the docstring for the class, not the object.
Here are some where it might help:
Source readability: As a human reading your source, I can tell the intent from the a.__doc__ = … right near the construction of a. Then again, I could tell the same intent just as easily from a Sphinx comment on the constant.
Debugging: pdb doesn't really do much with docstrings, but some GUI debuggers wrapped around it do, and most of them are probably going to show a.__doc__.
Custom dynamic use of docstrings: Obviously any code that you write that does something with a.__doc__ is going to get the instance docstring if you want it to, and therefore can do whatever it wants with it. However, keep in mind that if you want to define your own "protocol", you should use your own name, not one reserved for the implementation.
Notice that most of the same is true for using a descriptor for the docstring:
>>> class C:
... #property
... def __doc__(self):
... return('C doc')
>>> c = C()
If you type c.__doc__, you'll get 'C doc', but help(c) will treat it as an object with no docstring.
It's worth noting that making help work is one of the reasons some dynamic proxy libraries generate new classes on the fly—that is, a proxy to underlying type Spam has some new type like _SpamProxy, instead of the same GenericProxy type used for proxies to Hams and Eggseses. The former allows help(myspam) to show dynamically-generated information about Spam. But I don't know how important a reason it is; often you already need dynamic classes to, e.g., make special method lookup work, at which point adding dynamic docstrings comes for free.
I think it's preferred to keep it under the class via your doc string as it will also aid any developer that works on the code. However if you are doing something dynamic that requires this setup then I don't see any reason why not. Just understand that it adds a level of indirection that makes things less clear to others.
Remember to K.I.S.S. where applicable :)
I just stumbled over this and noticed that at least with python 3.9.5 the behavior seems to have changed.
E.g. using the above example, when I call:
help(a)
I get:
Help on MyClass in module __main__:
<__main__.MyClass object>
A description of the object
Also for reference, have a look at the pydoc implementation which shows:
def _getowndoc(obj):
"""Get the documentation string for an object if it is not
inherited from its class."""
try:
doc = object.__getattribute__(obj, '__doc__')
if doc is None:
return None
if obj is not type:
typedoc = type(obj).__doc__
if isinstance(typedoc, str) and typedoc == doc:
return None
return doc
except AttributeError:
return None

How does open handle context management?

The python built-ins open, and file work with context managers in a way that I don't quite understand.
It is to my understanding that open will create a file. file implements the context-manager methods __enter__ and __exit__. I would initially expect __enter__ to implement the actual opening of the file descriptor.
However, using open outside of a with block will return a file which is already open. So, it appears either file.__init__ or open is actually opening the file descriptor, and as far as I can tell file.__enter__ isn't doing anything. Or maybe file.__init__/open calls file.__enter__ directly?
First question:
What is the execution-flow of the open built-in? What does open handle, what does file.__init__ handle, and what does file.__enter__ handle? How does this work when re-using one file object for multiple cycles of opening/closing the file? How is this different from re-using other contextmanager objects for multiple context-cycles?
Second question:
Objects such as file objects have a setup steps and teardown steps. The setup occurs in __init__ , and the tear-down occurs in either close or __exit__.
Is this a good design pattern? Should this design pattern be implemented for custom functions/context managers?
If you look in _pyio.py (a pure-Python implementation of the io module) you find the following code in class IOBase:
### Context manager ###
def __enter__(self): # That's a forward reference
"""Context management protocol. Returns self (an instance of IOBase)."""
self._checkClosed()
return self
def __exit__(self, *args):
"""Context management protocol. Calls close()"""
self.close()
This contains the answers to most of your questions. The important thing to understand is that the context manager's function is to insure that you close the file when you are done with it. It does this simply by calling the close function, which saves you the trouble of doing so.
What does file.__enter__ handle? Nothing. It simply returns to you the file object that was the result of the call to the built-in function open().
How does this work when using one file object for multiple cycles of opening and closing the file? The context manager is not very useful for that purpose, since you must explicitly open the file each time.
Is this a good design pattern? Yes, because it reduces the amount of code you have to write, it's easy to read and understand.
Should this pattern be implemented for custom functions/context managers? Any time you have an object that needs to be cleaned up, or has usage that involves some type of open/close concept, you should consider this pattern. The standard library has many other examples.
For Question 1
In CPython, open() does nothing but creating a file object, which the underlying C type is PyFileObject; See source code in bltinmodule.c and fileobject.c
static PyObject *
builtin_open(PyObject *self, PyObject *args, PyObject *kwds)
{
return PyObject_Call((PyObject*)&PyFile_Type, args, kwds);
}
file.__init__ would open the file
file.__enter__ indeed do nothing except doing empty check on field file.fp
file.__exit__ invoke close() method to close file
For Question 2
Why file design like this is due to a historical reason.
open and with are two different keywords introduced on different versions of CPython. with was introduced till Python 2.5 (see PEP 343). At that time, open has been used for a long time.
For our customized type, we could design like file or not, depends on the concrete application context.
For example, threading.Lock is a different design, its init and enter are separately.

What's the difference between io.open() and os.open() on Python?

I realised that the open() function I've been using was an alias to io.open() and that importing * from os would overshadow that.
What's the difference between opening files through the io module and os module?
io.open() is the preferred, higher-level interface to file I/O. It wraps the OS-level file descriptor in an object that you can use to access the file in a Pythonic manner.
os.open() is just a wrapper for the lower-level POSIX syscall. It takes less symbolic (and more POSIX-y) arguments, and returns the file descriptor (a number) that represents the opened file. It does not return a file object; the returned value will not have read() or write() methods.
From the os.open() documentation:
This function is intended for low-level I/O. For normal usage, use the built-in function open(), which returns a “file object” with read() and write() methods (and many more).
Absolutely everything:
os.open() takes a filename as a string, the file mode as a bitwise mask of attributes, and an optional argument that describes the file permission bits, and returns a file descriptor as an integer.
io.open() takes a filename as a string or a file descriptor as an integer, the file mode as a string, and optional arguments that describe the file encoding, buffering used, how encoding errors and newlines are handled, and if the underlying FD is closed when the file is closed, and returns some descendant of io.IOBase.
os.open is very similar to open() from C in Unix. You're unlikely to want to use it unless you're doing something much more low-level. It gives you an actual file descriptor (as in, a number, not an object).
io.open is your basic Python open() and what you want to use just about all the time.
To add to the existing answers:
I realised that the open() function I've been using was an alias to io.open()
open() == io.open() in Python 3 only. In Python 2 they are different.
While with open() in Python we can obtain an easy-to-use file object with handy read() and write() methods, on the OS level files are accessed using file descriptors (or file handles in Windows). Thus, os.open() should be used implicitly under the hood. I haven't examined Python source code in this regard, but the documentation for the opener parameter, which was added for open() in Python 3.3, says:
A custom opener can be used by passing a callable as opener. The
underlying file descriptor for the file object is then obtained by
calling opener with (file, flags). opener must return an open file
descriptor (passing os.open as opener results in functionality similar
to passing None).
So os.open() is the default opener for open(), and we also have the ability to specify a custom wrapper around it if file flags or mode need to be changed. See the documentation for open() for an example of a custom opener, which opens a file relative to a given directory.
In Python 2, the built-in open and io.open were different (io.open was newer and supported more things). In Python 3, open and io.open are now the same thing (they got rid of the old built-in open), so you should always use open. Code that needs to be compatible with Python 2 and 3 might have a reason to use io.open.
Below code to validate this.
import io
with io.open("../input/files.txt") as f:
text = f.read().lower()
with open('../input/files.txt', encoding='utf-8') as f2:
text2 = f2.read().lower()
print(type(f))
print(type(f2))
# <class '_io.TextIOWrapper'>
# <class '_io.TextIOWrapper'>
Database and system application developers usually use open instead of fopen as the former provides finer control on when, what and how the memory content should be written to its backing store (i.e., file on disk).
In Unix-like operating system, open is used to open regular file, socket end-point, device, pipe, etc. A positive file descriptor number is returned for every successful open function call. It provides a consistent API and framework to check for event notification, etc on a variety of these objects.
However, fopen is a standard C function and is normally used to open regular file and return a FILE data structure. fopen, actually, will call open eventually. fopen is good enough for normal usage as developers do not need to worry when to flush or sync memory content to the disk and do not need event notification.
os.open() method opens the file file and set various flags according to flags and possibly its mode according to mode.
The default mode is 0777 (octal), and the current unmask value is first masked out.
This method returns the file descriptor for the newly opened file.
While,
io.open() method opens a file, in the mode specified in the string mode. It returns a new file handle, or, in case of errors, nil plus an error message.
Hope this helps

How do you check if an object is an instance of 'file'?

It used to be in Python (2.6) that one could ask:
isinstance(f, file)
but in Python 3.0 file was removed.
What is the proper method for checking to see if a variable is a file now? The What'sNew docs don't mention this...
def read_a_file(f)
try:
contents = f.read()
except AttributeError:
# f is not a file
substitute whatever methods you plan to use for read. This is optimal if you expect that you will get passed a file like object more than 98% of the time. If you expect that you will be passed a non file like object more often than 2% of the time, then the correct thing to do is:
def read_a_file(f):
if hasattr(f, 'read'):
contents = f.read()
else:
# f is not a file
This is exactly what you would do if you did have access to a file class to test against. (and FWIW, I too have file on 2.6) Note that this code works in 3.x as well.
In python3 you could refer to io instead of file and write
import io
isinstance(f, io.IOBase)
Typically, you don't need to check an object type, you could use duck-typing instead i.e., just call f.read() directly and allow the possible exceptions to propagate -- it is either a bug in your code or a bug in the caller code e.g., json.load() raises AttributeError if you give it an object that has no read attribute.
If you need to distinguish between several acceptable input types; you could use hasattr/getattr:
def read(file_or_filename):
readfile = getattr(file_or_filename, 'read', None)
if readfile is not None: # got file
return readfile()
with open(file_or_filename) as file: # got filename
return file.read()
If you want to support a case when file_of_filename may have read attribute that is set to None then you could use try/except over file_or_filename.read -- note: no parens, the call is not made -- e.g., ElementTree._get_writer().
If you want to check certain guarantees e.g., that only one single system call is made (io.RawIOBase.read(n) for n > 0) or there are no short writes (io.BufferedIOBase.write()) or whether read/write methods accept text data (io.TextIOBase) then you could use isinstance() function with ABCs defined in io module e.g., look at how saxutils._gettextwriter() is implemented.
Works for me on python 2.6... Are you in a strange environment where builtins aren't imported by default, or where somebody has done del file, or something?

Categories