python, default parameters/arguments for builtin open? - python

Right, simple question here. I couldn't find an answer using google and looking here.
I need to write a few stubs for builtin functions within python like open(name[, mode[, buffering]]). However I can't seem to find the default values for mode and buffering
It does not seem to be None

For making wrappers for the built-ins, what you usually end up doing is something like:
def myOpen(name, mode='r', buffer=None):
if buffer:
open_file = open(name, mode, buffer)
else:
open_file = open(name, mode)
The reason is that not all the arguments are accesible via keyword (buffer in this case).

See the documentation of open():
If mode is omitted, it defaults to 'r'.
...
The optional buffering argument specifies the file’s desired buffer size (...) If omitted, the system default is used.
When it comes to the system default for buffering:
Specifying a buffer size currently has no effect on systems that don’t have setvbuf(). The interface to specify the buffer size is not done using a method that calls setvbuf(), because that may dump core when called after any I/O has been performed, and there’s no reliable way to determine whether this is the case.

Related

Which python module contains file object methods?

While it is simple to search by using help for most methods that have a clear help(module.method) arrangement, for example help(list.extend), I cannot work out how to look up the method .readline() in python's inbuilt help function.
Which module does .readline belong to? How would I search in help for .readline and related methods?
Furthermore is there any way I can use the interpreter to find out which module a method belongs to in future?
Don't try to find the module. Make an instance of the class you want, then call help on the method of that instance, and it will find the correct help info for you. Example:
>>> f = open('pathtosomefile')
>>> help(f.readline)
Help on built-in function readline:
readline(size=-1, /) method of _io.TextIOWrapper instance
Read until newline or EOF.
Returns an empty string if EOF is hit immediately.
In my case (Python 3.7.1), it's defined on the type _io.TextIOWrapper (exposed publicly as io.TextIOWrapper, but help doesn't know that), but memorizing that sort of thing isn't very helpful. Knowing how to figure it out by introspecting the specific thing you care about is much more broadly applicable. In this particular case, it's extra important not to try guessing, because the open function can return a few different classes, each with different methods, depending on the arguments provided, including io.BufferedReader, io.BufferedWriter, io.BufferedRandom, and io.FileIO, each with their own version of the readline method (though they all share a similar interface for consistency's sake).
From the text of help(open):
open() returns a file object whose type depends on the mode, and
through which the standard file operations such as reading and writing
are performed. When open() is used to open a file in a text mode ('w',
'r', 'wt', 'rt', etc.), it returns a TextIOWrapper. When used to open
a file in a binary mode, the returned class varies: in read binary
mode, it returns a BufferedReader; in write binary and append binary
modes, it returns a BufferedWriter, and in read/write mode, it returns
a BufferedRandom.
See also the section of python's io module documentation on the class hierarchy.
So you're looking at TextIOWrapper, BufferedReader, BufferedWriter, or BufferedRandom. These all have their own sets of class hierarchies, but suffice it to say that they share the IOBase superclass at some point - that's where the functions readline() and readlines() are declared. Of course, each subclass implements these functions differently for its particular mode - if you do
help(_io.TextIOWrapper.readline)
you should get the documentation you're looking for.
In particular, you're having trouble accessing the documentation for whichever version of readline you need, because you can't be bothered to figure out which class it is. You can actually call help on an object as well. If you're working with a particular file object, then you can spin up a terminal, instantiate it, and then just pass it to help() and it'll show you whatever interface is closest to the surface. Example:
x = open('some_file.txt', 'r')
help(x.readline)

How to set the filename encoding globally for a python interpreter?

In a project, we need to open many files all over the place in the source code. We know that all filenames on disk are encoded in utf-8 and all filenames are proceed as Unicode internally.
Is there a way to set the filename encoding globally for the running interpreter, so that we change the following command:
open(filename.encode('utf-8'))
to this simpler version:
open(filename)
This would reduces errors and confusion among the developers. We use python 2.7.
Assuming this is really what you meant (and not the encoding of the file's contents), this would do it:
open = lambda fname, *a, **kw: __builtins__.open(fname.encode('utf-8'), *a, **kw)
This will only affect modules that include (or import) the redefinition, so it's reasonably safe. But it might be less confusing, and certainly less trouble-prone over the long run, to provide a different command for opening files in your environment for which this applies:
def fsopen(fname, *args, **kwargs):
"""(explain when and why you want this)"""
return open(fname.encode('utf-8'), *args, **kwargs)
There is no such option, because the open function has documented behaviour not to UTF-8 encode the strings.
There's a simple reason for this: open is more or less just a delegate of the underlying OS function, and must work on any string that the OS function would work with. Hence, it expects raw Python strings and not unicode.
If you need this, implement a wrapper function to do this.
In Python, you can just override the local name -- or the global one, for that matter:
__uni_open = lambda string: __builtins__.open(string.encode("utf-8"))
open = __uni_open
in whatever module all your applications work with.
I don't recommend doing this -- it might break stuff beyond your project's control.

Can I make decode(errors="ignore") the default for all strings in a Python 2.7 program?

I have a Python 2.7 program that writes out data from various external applications. I continually get bitten by exceptions when I write to a file until I add .decode(errors="ignore") to the string being written out. (FWIW, opening the file as mode="wb" doesn't fix this.)
Is there a way to say "ignore encoding errors on all strings in this scope"?
You cannot redefine methods on built-in types, and you cannot change the default value of the errors parameter to str.decode(). There are other ways to achieve the desired behaviour, though.
The slightly nicer way: Define your own decode() function:
def decode(s, encoding="ascii", errors="ignore"):
return s.decode(encoding=encoding, errors=errors)
Now, you will need to call decode(s) instead of s.decode(), but that's not too bad, isn't it?
The hack: You can't change the default value of the errors parameter, but you can overwrite what the handler for the default errors="strict" does:
import codecs
def strict_handler(exception):
return u"", exception.end
codecs.register_error("strict", strict_handler)
This will essentially change the behaviour of errors="strict" to the standard "ignore" behaviour. Note that this will be a global change, affecting all modules you import.
I recommend neither of these two ways. The real solution is to get your encodings right. (I'm well aware that this isn't always possible.)
As mentioned in my thread on the issue the hack from Sven Marnach is even possible without a new function:
import codecs
codecs.register_error("strict", codecs.ignore_errors)
I'm not sure what your setup is exactly, but you can derive a class from str and override its decode method:
class easystr(str):
def decode(self):
return str.decode(self, errors="ignore")
If you then convert all incoming strings to easystr, errors will be silently ignored:
line = easystr(input.readline())
That said, decoding a string converts it to unicode, which should never be lossy. Could you figure out which encoding your strings are using and give that as the encoding argument to decode? That would be a better solution (and you can still make it the default in the above way).
Yet another thing you should try is to read your data differently. Do it like this and the decoding errors may well disappear:
import codecs
input = codecs.open(filename, "r", encoding="latin-1") # or whatever

What's the difference between io.open() and os.open() on Python?

I realised that the open() function I've been using was an alias to io.open() and that importing * from os would overshadow that.
What's the difference between opening files through the io module and os module?
io.open() is the preferred, higher-level interface to file I/O. It wraps the OS-level file descriptor in an object that you can use to access the file in a Pythonic manner.
os.open() is just a wrapper for the lower-level POSIX syscall. It takes less symbolic (and more POSIX-y) arguments, and returns the file descriptor (a number) that represents the opened file. It does not return a file object; the returned value will not have read() or write() methods.
From the os.open() documentation:
This function is intended for low-level I/O. For normal usage, use the built-in function open(), which returns a “file object” with read() and write() methods (and many more).
Absolutely everything:
os.open() takes a filename as a string, the file mode as a bitwise mask of attributes, and an optional argument that describes the file permission bits, and returns a file descriptor as an integer.
io.open() takes a filename as a string or a file descriptor as an integer, the file mode as a string, and optional arguments that describe the file encoding, buffering used, how encoding errors and newlines are handled, and if the underlying FD is closed when the file is closed, and returns some descendant of io.IOBase.
os.open is very similar to open() from C in Unix. You're unlikely to want to use it unless you're doing something much more low-level. It gives you an actual file descriptor (as in, a number, not an object).
io.open is your basic Python open() and what you want to use just about all the time.
To add to the existing answers:
I realised that the open() function I've been using was an alias to io.open()
open() == io.open() in Python 3 only. In Python 2 they are different.
While with open() in Python we can obtain an easy-to-use file object with handy read() and write() methods, on the OS level files are accessed using file descriptors (or file handles in Windows). Thus, os.open() should be used implicitly under the hood. I haven't examined Python source code in this regard, but the documentation for the opener parameter, which was added for open() in Python 3.3, says:
A custom opener can be used by passing a callable as opener. The
underlying file descriptor for the file object is then obtained by
calling opener with (file, flags). opener must return an open file
descriptor (passing os.open as opener results in functionality similar
to passing None).
So os.open() is the default opener for open(), and we also have the ability to specify a custom wrapper around it if file flags or mode need to be changed. See the documentation for open() for an example of a custom opener, which opens a file relative to a given directory.
In Python 2, the built-in open and io.open were different (io.open was newer and supported more things). In Python 3, open and io.open are now the same thing (they got rid of the old built-in open), so you should always use open. Code that needs to be compatible with Python 2 and 3 might have a reason to use io.open.
Below code to validate this.
import io
with io.open("../input/files.txt") as f:
text = f.read().lower()
with open('../input/files.txt', encoding='utf-8') as f2:
text2 = f2.read().lower()
print(type(f))
print(type(f2))
# <class '_io.TextIOWrapper'>
# <class '_io.TextIOWrapper'>
Database and system application developers usually use open instead of fopen as the former provides finer control on when, what and how the memory content should be written to its backing store (i.e., file on disk).
In Unix-like operating system, open is used to open regular file, socket end-point, device, pipe, etc. A positive file descriptor number is returned for every successful open function call. It provides a consistent API and framework to check for event notification, etc on a variety of these objects.
However, fopen is a standard C function and is normally used to open regular file and return a FILE data structure. fopen, actually, will call open eventually. fopen is good enough for normal usage as developers do not need to worry when to flush or sync memory content to the disk and do not need event notification.
os.open() method opens the file file and set various flags according to flags and possibly its mode according to mode.
The default mode is 0777 (octal), and the current unmask value is first masked out.
This method returns the file descriptor for the newly opened file.
While,
io.open() method opens a file, in the mode specified in the string mode. It returns a new file handle, or, in case of errors, nil plus an error message.
Hope this helps

How do you check if an object is an instance of 'file'?

It used to be in Python (2.6) that one could ask:
isinstance(f, file)
but in Python 3.0 file was removed.
What is the proper method for checking to see if a variable is a file now? The What'sNew docs don't mention this...
def read_a_file(f)
try:
contents = f.read()
except AttributeError:
# f is not a file
substitute whatever methods you plan to use for read. This is optimal if you expect that you will get passed a file like object more than 98% of the time. If you expect that you will be passed a non file like object more often than 2% of the time, then the correct thing to do is:
def read_a_file(f):
if hasattr(f, 'read'):
contents = f.read()
else:
# f is not a file
This is exactly what you would do if you did have access to a file class to test against. (and FWIW, I too have file on 2.6) Note that this code works in 3.x as well.
In python3 you could refer to io instead of file and write
import io
isinstance(f, io.IOBase)
Typically, you don't need to check an object type, you could use duck-typing instead i.e., just call f.read() directly and allow the possible exceptions to propagate -- it is either a bug in your code or a bug in the caller code e.g., json.load() raises AttributeError if you give it an object that has no read attribute.
If you need to distinguish between several acceptable input types; you could use hasattr/getattr:
def read(file_or_filename):
readfile = getattr(file_or_filename, 'read', None)
if readfile is not None: # got file
return readfile()
with open(file_or_filename) as file: # got filename
return file.read()
If you want to support a case when file_of_filename may have read attribute that is set to None then you could use try/except over file_or_filename.read -- note: no parens, the call is not made -- e.g., ElementTree._get_writer().
If you want to check certain guarantees e.g., that only one single system call is made (io.RawIOBase.read(n) for n > 0) or there are no short writes (io.BufferedIOBase.write()) or whether read/write methods accept text data (io.TextIOBase) then you could use isinstance() function with ABCs defined in io module e.g., look at how saxutils._gettextwriter() is implemented.
Works for me on python 2.6... Are you in a strange environment where builtins aren't imported by default, or where somebody has done del file, or something?

Categories