Backporting Python 3 open(encoding="utf-8") to Python 2 - python

I have a Python codebase, built for Python 3, which uses Python 3 style open() with encoding parameter:
https://github.com/miohtama/vvv/blob/master/vvv/textlineplugin.py#L47
with open(fname, "rt", encoding="utf-8") as f:
Now I'd like to backport this code to Python 2.x, so that I would have a codebase which works with Python 2 and Python 3.
What's the recommended strategy to work around open() differences and lack of encoding parameter?
Could I have a Python 3 open() style file handler which streams bytestrings, so it would act like Python 2 open()?

1. To get an encoding parameter in Python 2:
If you only need to support Python 2.6 and 2.7 you can use io.open instead of open. io is the new io subsystem for Python 3, and it exists in Python 2,6 ans 2.7 as well. Please be aware that in Python 2.6 (as well as 3.0) it's implemented purely in python and very slow, so if you need speed in reading files, it's not a good option.
If you need speed, and you need to support Python 2.6 or earlier, you can use codecs.open instead. It also has an encoding parameter, and is quite similar to io.open except it handles line-endings differently.
2. To get a Python 3 open() style file handler which streams bytestrings:
open(filename, 'rb')
Note the 'b', meaning 'binary'.

I think
from io import open
should do.

Here's one way:
with open("filename.txt", "rb") as f:
contents = f.read().decode("UTF-8")
Here's how to do the same thing when writing:
with open("filename.txt", "wb") as f:
f.write(contents.encode("UTF-8"))

This may do the trick:
import sys
if sys.version_info[0] > 2:
# py3k
pass
else:
# py2
import codecs
import warnings
def open(file, mode='r', buffering=-1, encoding=None,
errors=None, newline=None, closefd=True, opener=None):
if newline is not None:
warnings.warn('newline is not supported in py2')
if not closefd:
warnings.warn('closefd is not supported in py2')
if opener is not None:
warnings.warn('opener is not supported in py2')
return codecs.open(filename=file, mode=mode, encoding=encoding,
errors=errors, buffering=buffering)
Then you can keep you code in the python3 way.
Note that some APIs like newline, closefd, opener do not work

If you are using six, you can try this, by which utilizing the latest Python 3 API and can run in both Python 2/3:
import six
if six.PY2:
# FileNotFoundError is only available since Python 3.3
FileNotFoundError = IOError
from io import open
fname = 'index.rst'
try:
with open(fname, "rt", encoding="utf-8") as f:
pass
# do_something_with_f ...
except FileNotFoundError:
print('Oops.')
And, Python 2 support abandon is just deleting everything related to six.

Not a general answer, but may be useful for the specific case where you are happy with the default python 2 encoding, but want to specify utf-8 for python 3:
if sys.version_info.major > 2:
do_open = lambda filename: open(filename, encoding='utf-8')
else:
do_open = lambda filename: open(filename)
with do_open(filename) as file:
pass

Related

How to print unicode to both terminal and file redirect

I read everything there is to read about Unicode, UTF-8, encoding/decoding and everything, but I still strugle.
I made a short example snippet to illustrate my problem.
I want to print the string 'Geïrriteerd' just like it is written here. I need to use the following code to let it print properly to a file if I run it with a redirect to a file, like 'Test.py > output'
# coding=utf-8
import codecs
import sys
sys.stdout = codecs.getwriter('UTF-8')(sys.stdout)
print u'Geïrriteerd'
But if I do NOT redirect, the code above prints 'Geïrriteerd' to the terminal.
If I remove the 'codecs.getwriter' line, it prints fine again to the terminal but will print 'Geïrriteerd' to the file.
How can I get this to print properly in both cases?
I am using Python 2.7 on Windows 10. I know Python 3.x handles unicode better in general, but I can't use that in my project (yet) due to other dependencies.
Since redirection is a shell operation, it makes sense to control the encoding using the shell as well. Fortunately, Python provides an environment variable to control the encoding. Given test.py:
#!python2
# coding=utf-8
print u'Geïrriteerd'
To redirect to a file with a particular encoding, use:
C:\>set PYTHONIOENCODING=utf8
C:\>test >out.txt
Running the script normally with PYTHONIOENCODING undefined will use the encoding of the terminal (in my case cp437):
C:\>set PYTHONIOENCODING=
C:\>test
Geïrriteerd
Your terminal is set up for cp850 instead of UTF-8.
Run chcp 65001.
http://enwp.org/Chcp_(command)
http://enwp.org/Windows_code_page#List
You need to "encode" your unicode first to write to file or display. You do not really need the codecs module.
The docs provide really good examples for working with unicode.
print type(u'Geïrriteerd')
print type(u'Geïrriteerd'.encode('utf-8'))
print u'Geïrriteerd'.encode('utf-8')
with open('test.txt', 'wb') as f:
f.write(u'Geïrriteerd'.encode('utf-8'))
with open('test.txt', 'r') as f:
content = f.read()
print content
#If you want to use codecs still
import codecs
with codecs.open("test.txt", "w", encoding="utf-8") as f:
f.write(u'Geïrriteerd')
with open('test.txt', 'r') as f:
content = f.read()
print content

tf.contrib.learn load_csv_with_header not working in TensorFlow 1.1

I installed the latest TensorFlow (v1.1.0) and I tried to run the tf.contrib.learn Quickstart tutorial, where you suppose to build a classifier for the IRIS data set. However, when I tried:
training_set = tf.contrib.learn.datasets.base.load_csv_with_header(
filename=IRIS_TRAINING,
target_dtype=np.int,
features_dtype=np.float32)
I got a StopIteration error.
When I checked the API, I didn't find anything about the load_csv_with_header(). Have they changed it in the latest version without updating the tutorial? How can I fix this?
EDIT:
I use Python3.6 if this makes any difference.
This is because of the difference between Python 2 and Python 3. Here's my code below that works for Python 3.5:
if not os.path.exists(IRIS_TRAINING):
raw = urllib.request.urlopen(IRIS_TRAINING_URL).read().decode()
with open(IRIS_TRAINING, 'w') as f:
f.write(raw)
if not os.path.exists(IRIS_TEST):
raw = urllib.request.urlopen(IRIS_TEST_URL).read().decode()
with open(IRIS_TEST, 'w') as f:
f.write(raw)
What probably happened is that your code created a file name after IRIS_TRAINING. But the file is empty. Thus StopIteration is raised. If you look into the implementation of load_csv_with_header:
with gfile.Open(filename) as csv_file:
data_file = csv.reader(csv_file)
header = next(data_file)
StopIteration is raised when next does not detect any additional items to read as documented https://docs.python.org/3.5/library/exceptions.html#StopIteration
Note the change in my code compared to the Python 2 version as shown in Tensorflow tutorial:
urllib.request.urlopen instead of urllib.urlopen
decode() is performed after read()
StopIteration should only happen there if the csv file is empty. Did you check that that path (IRIS_TRAINING) resolves to something you have permission to open?
or you can write the csv file as binary instead of adding decode()
if not os.path.exists(IRIS_TRAINING):
raw = urllib.request.urlopen(IRIS_TRAINING_URL).read()
with open(IRIS_TRAINING, 'wb') as f:
f.write(raw)
if the answer above doesn't work, you may specify your path of the iris_training.csv and iris_test.csv file in the urlopen() method.

redirect sys.stdout to a file without buffering in Python 3

I have a script that writes to a log file. In Python 2, my quick solution to allow tailing/viewing of the log as it progressed was by assigning sys.stdout to a file object with buffering set to 0:
original_stdout = sys.stdout
sys.stdout = open(log_file, 'w', 0)
Once set, any print statements in the script's functions redirect to the log file very nicely.
Running the 2to3-converted version under Python 3 gives the following error: ValueError: can't have unbuffered text I/O. Changing the 'w' above to 'wb' solves that, so the structure of the block is
original_stdout = sys.stdout
sys.stdout = open(log_file, 'wb', 0)
print("{}".format(first_message))
but now the first print statement errors with TypeError: 'str' does not support the buffer interface. I tried explicitly casting the string to bytes
print(bytes("{}".format(first_message), "UTF-8"))
but that produces the same TypeError as before.
What is the easiest way to write unbuffered text to a file in Python 3?
According to Python 3.4.3 documentation at https://docs.python.org/3/library/io.html#raw-i-o and 3.5 documenmtation at https://docs.python.org/3.5/library/io.html#raw-i-o the way to get unbuffered IO is with Raw IO which can be enabled as in:
f = open("myfile.jpg", "rb", buffering=0)
That means "wb" should work for writing.
Details on Raw IO are at https://docs.python.org/3/library/io.html#io.RawIOBase and https://docs.python.org/3.5/library/io.html#io.RawIOBase which appear to be the same.
I did some testing and found buffering of Text IO to be severe and can amount to hundreds of lines and this happens even when writing to sys.stderr and redirecting the error output to a file, on Windows 7 at least. The I tried Raw IO and it worked great! - each line printed came through immediately and in plain text in tail -f output. This is what worked for me on Windows 7 with Python 3.4.3 and using tail bundled with GitHub tools:
import time
import sys
f = open("myfile.txt", "ab", buffering=0)
c = 0
while True:
f.write(bytes("count is " + str(c) + '\n','utf-8'))
c += 1
time.sleep(1)
If by unbuffered you mean having the outputs immediately flushed to disk, you can simply do this:
original_stdout = sys.stdout
sys.stdout = open(log_file, 'w')
print(log_message, flush=True)
As print is now a first-class function you can also specify which file to print to, such as:
fd = open(log_file, 'w')
print(log_message, file=fd, flush=True)
The issue seems to be in the way you open the file -
open(log_file, 'w', 0)
From Python 3.x documentation -
open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
The third argument for open() determines the buffering mode for the file, 0 means no buffering. I do not think you can make it work by just using 'wb' instead of 'w' .
You should remove that 0 third argument, and let open() use default line buffering for text files. Example -
open(log_file, 'w')

Python zipfile, How to set the compression level?

Python supports zipping files when zlib is available, ZIP_DEFLATE
see:
https://docs.python.org/3.4/library/zipfile.html
The zip command-line program on Linux supports -1 fastest, -9 best.
Is there a way to set the compression level of a zip file created in Python's zipfile module?
Starting from python 3.7, the zipfile module added the compresslevel parameter.
https://docs.python.org/3/library/zipfile.html
I know this question is dated, but for people like me, that fall in this question, it may be a better option than the accepted one.
The zipfile module does not provide this. During compression it uses constant from zlib - Z_DEFAULT_COMPRESSION. By default it equals -1. So you can try to change this constant manually, as possible solution.
Python 3.7+ answer: If you look at the zipfile.ZipFile constructor you'll see this:
def __init__(self, file, mode="r", compression=ZIP_STORED, allowZip64=True,
compresslevel=None):
"""Open the ZIP file with mode read 'r', write 'w', exclusive create 'x',
or append 'a'.
...
compression: ZIP_STORED (no compression), ZIP_DEFLATED (requires zlib),
ZIP_BZIP2 (requires bz2) or ZIP_LZMA (requires lzma).
compresslevel: None (default for the given compression type) or an integer
specifying the level to pass to the compressor.
When using ZIP_STORED or ZIP_LZMA this keyword has no effect.
When using ZIP_DEFLATED integers 0 through 9 are accepted.
When using ZIP_BZIP2 integers 1 through 9 are accepted.
"""
which means you can pass the desired compression in the constructor:
myzip = zipfile.ZipFile(file_handle, "w", compression=zipfile.ZIP_DEFLATED, compresslevel=9)
See also https://docs.python.org/3/library/zipfile.html

reading and writing into file, proper indenting and syntax highlighting

I am trying to read from one file and write to another file using:
with open('example2') as inpf, open('outputrt','w') as outf:
for l in inpf:
outf.write(l)
But I am getting a syntax error at the line 1 i.e.
"with open('example2') as inpf, open('outputrt','w') as outf:" pointing at "inpf,"
My python version is 2.6. Is there an error in syntax?
That syntax is only supported in 2.7+.
In 2.6 you can do:
import contextlib
with contextlib.nested(open('example2'), open('outputrt','w')) as (inpf, outf):
for l in inpf:
outf.write(l)
Or it might look cleaner to you to do this (this would be my preference):
with open('example2') as inpf:
with open('outputrt','w') as outf:
for l in inpf:
outf.write(l)
In python versons <= 2.6, you can use
inPipe = open("example2", "r")
outPipe = open("outputrt", "w")
for k in inPipe:
outPipe.write(k)

Categories