Why not monkey patch sys.getfilesystemencoding()? - python

In Python can read the filesystem encoding with sys.getfilesystemencoding().
But there seems to be no official way to set the filesystem encoding.
See: How to change file system encoding via python?
I found this dirty hack:
import sys
sys.getfilesystemencoding = lambda: 'UTF-8'
Is there a better solution, if changing environment variable LANG before starting the interpreter is not an option?
Background, why I want this:
This works:
user#host:~$ python src/setfilesystemencoding.py
LANG: de_DE.UTF-8
sys.getdefaultencoding(): ascii
sys.getfilesystemencoding(): UTF-8
This does not work:
user#host:~$ LANG=C python src/setfilesystemencoding.py
LANG: C
sys.getdefaultencoding(): ascii
sys.getfilesystemencoding(): ANSI_X3.4-1968
Traceback (most recent call last):
File "src/setfilesystemencoding.py", line 10, in <module>
with open('/tmp/german-umlauts-üöä', 'wb') as fd:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 20-22: ordinal not in range(128)
Here is the simple script:
# -*- coding: utf-8 -*-
from __future__ import absolute_import, division, unicode_literals, print_function
import os, sys
print('LANG: {}'.format(os.environ['LANG']))
print('sys.getdefaultencoding(): {}'.format(sys.getdefaultencoding()))
print('sys.getfilesystemencoding(): {}'.format(sys.getfilesystemencoding()))
with open('/tmp/german-umlauts-üöä', 'wb') as fd:
fd.write('foo')
I hopped that above monkey patching would solve this ... but it doesn't. Sorry, this question does not make sense any more. I close it.
My solution: use LANG=C.UTF-8

Related

Python printing unicode characters instead of ASCII

I want to print an ASCII text but when I run the script, it throws me an error:
$ python test.py Traceback (most recent call last):
File "C:\Users\wooxh\Desktop\Materialy\XRichPresence\test.py",
line 1, in <module> print(""" File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.2032.0_x64__qbz5n2kfra8p0\lib\encodings\cp1250.py",
line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError:
'charmap' codec can't encode characters in position 2-4: character maps to <undefined>
Here's the code
print("""
██╗ ██╗██████╗ ██████╗ ██████╗
╚██╗██╔╝██╔══██╗██╔══██╗██╔════╝
╚███╔╝ ██████╔╝██████╔╝██║
██╔██╗ ██╔══██╗██╔═══╝ ██║
██╔╝ ██╗██║ ██║██║ ╚██████╗
╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═════╝
""")
Looks like Python is identifying your code page as 1250, which doesn't include the characters you're using. If chcp reports you're actually using code page 437 (common in cmd.exe) you can do:
import sys
sys.stdout.buffer.write("""
██╗ ██╗██████╗ ██████╗ ██████╗
╚██╗██╔╝██╔══██╗██╔══██╗██╔════╝
╚███╔╝ ██████╔╝██████╔╝██║
██╔██╗ ██╔══██╗██╔═══╝ ██║
██╔╝ ██╗██║ ██║██║ ╚██████╗
╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═════╝
""".encode('cp437'))
to explicitly encode to the correct code page and write it. Otherwise, I'd suggest enabling Python's forced UTF-8 runtime mode, which should allow your original code (with no call to encode) to work (possibly dropping or replacing characters not representable by the terminal). All you'd change is your run command:
> python -X utf8 test.py
or explicitly define PYTHONUTF=1 in your environment to turn it on without a command line switch.

What character set is "é" from? (Python: Filename with "é", how to use os.path.exists , filecmp.cmp, shutil.move?)

What Character set is é from? In Windows notepad having this character in an ANSI text file will save fine. Insert something like 😍 and you'll get an error. é seems to work fine in ASCII terminal in Putty (Are CP437 and IBM437 the same?) where as 😍 does not.
I can see that 😍 is Unicode, not ASCII. But what is é? It doesn't give errors I get with Unicode in Notepad, but Python was throwing SyntaxError: Non-ASCII character '\xc3' in file on line , but no encoding declared; before I added a "magic comment" as suggested by Python NLTK: SyntaxError: Non-ASCII character '\xc3' in file (Sentiment Analysis -NLP).
I added the "magic comment" and don't get that error, but os.path.isfile() is saying a filename with é doesn't exist. Ironic that the character é is in Marc-André Lemburg, the author of the PEP the error links to.
EDIT: If I print the path of the file, the accented e shows up as ├⌐ but I can copy and paste é into the command prompt.
EDIT2: See below
Private > cat scratch.py ### LOL cat scratch :3
# coding=utf-8
file_name = r"Filéname"
file_name = unicode(file_name)
Private > python scratch.py
Traceback (most recent call last):
File "scratch.py", line 3, in <module>
file_name = unicode(file_name)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)
Private >
EDIT3:
Private > PS1="Private > " ; echo code below ; cat scratch.py ; echo ======= ; echo output below ; python scratch.py
code below
# -*- coding: utf-8 -*-
file_name = r"Filéname"
file_name = unicode(file_name, encoding="utf-8")
# I have code here to determine a path depending on the hostname of the
# machine, the folder paths contain no Unicode characters, for my debug
# version of the script, I will hardcode the redacted hostname.
hostname = "One"
if hostname == "One":
folder = "C:/path/folder_one"
elif hostname == "Two":
folder = "C:/path/folder_two"
else:
folder = "C:/path/folder_three"
path = "%s/%s" % (folder, file_name)
path = unicode(path, encoding="utf-8")
print path
=======
output below
Traceback (most recent call last):
File "scratch.py", line 18, in <module>
path = unicode(path, encoding="utf-8")
TypeError: decoding Unicode is not supported
Private >
You need to tell unicode what encoding the string is in, in this case it's utf-8 not ascii, and the file header should be # -*- coding: utf-8 -*-, Encoding Declarations
# -*- coding: utf-8 -*-
file_name = r"Filéname"
file_name = unicode(file_name, encoding="utf-8")
1 Help on class unicode in module __builtin__:
2
3 class unicode(basestring)
4 | unicode(object='') -> unicode object
5 | unicode(string[, encoding[, errors]]) -> unicode object
6 |
7 | Create a new Unicode object from the given encoded string.
8 | encoding defaults to the current default string encoding.
9 | errors can be 'strict', 'replace' or 'ignore' and defaults to 'strict'.
And as I mentioned in my previous comment you will save yourself a lot of headaches by switching to Python 3. Python 2 on a Windows filesystem with unicode characters can be a nightmare.

PyQt4 Pyrcc Resource File Encoding

I am trying to embed resources to my PyQt4 application.
However when I import the resource file that I create by using included pyrcc.exe I came accross encoding errors.
Pyrcc already adds the UTF-8 encoding line on top of the file as this was the common error I have seen online.
SyntaxError: Non-ASCII character '\xff' in file ... on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
As I have checked the error link it recommends to add encoding line to the file which I already have
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Resource object code
#
# Created: Mon Sep 14 10:40:57 2015
# by: The Resource Compiler for PyQt (Qt v4.8.6)
#
# WARNING! All changes made in this file will be lost!
from PyQt4 import QtCore
qt_resource_data = "\
\x00\x00\x06\xdf\
\x89\
\x50\x4e\x47\x0d\x0a\x1a\x0a\x00\x00\x00\x0d\x49\x48\x44\x52\x00\
\x00\x00\x40\x00\x00\x00\x40\x08\x06\x00\x00\x00\xaa\x69\x71\xde\
\x00\x00\x06\xa6\x49\x44\x41\x54\x78\x5e\xed\x5b\x5d\x72\xda\x48\...
Any suggestions would be appreciated. Thanks !

(python/boto sqs) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 5: ordinal not in range(128)

I can not send messages with accented characters for SQS in python with the AWS SDK (boto).
Versions
Python: 2.7.6
boto: 2.20.1
CODE
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import boto.sqs
from boto.sqs.message import RawMessage
# print boto.Version
sqs_conn = boto.sqs.connect_to_region(
'my_region',
aws_access_key_id='my_kye',
aws_secret_access_key='my_secret_ky')
queue = sqs_conn.get_queue('my_queue')
queue.set_message_class(RawMessage)
msg = RawMessage()
body = '1 café, 2 cafés, 3 cafés ...'
msg.set_body(body)
queue.write(msg)
One solution:
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
Full code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import boto.sqs
from boto.sqs.message import RawMessage
import sys # <== added this line
reload(sys) # <== added this line
sys.setdefaultencoding('utf-8') # <== added this line
# print boto.Version
sqs_conn = boto.sqs.connect_to_region(
'my_region',
aws_access_key_id='my_kye',
aws_secret_access_key='my_secret_ky')
queue = sqs_conn.get_queue('my_queue')
queue.set_message_class(RawMessage)
msg = RawMessage()
body = '1 café, 2 cafés, 3 cafés ...'
msg.set_body(body)
queue.write(msg)
Source: https://pythonadventures.wordpress.com/2012/03/29/ascii-codec-cant-encode-character/#comment-4672

Decoding with 'utf-8' but error show an Unicode Encode Error?

My Python 2.x script trys to download a web page including Chinese words. It's encoded in UTF-8. By urllib.openurl(url), I get content in type str, so I decode content with UTF-8. It throws UnicodeEncodeError. I googled a lot of posts like this and this, but they don't work for me. Am I misunderstand something?
My code is:
import urllib
import httplib
def get_html_content(url):
response = urllib.urlopen(url)
html = response.read()
print type(html)
return html
if __name__ == '__main__':
url = 'http://weekly.manong.io/issues/58'
html = get_html_content(url)
print html.decode('utf-8')
Error message:
<type 'str'>
Traceback (most recent call last):
File "E:\src\infra.py", line 32, in <module>
print html.decode('utf-8')
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 44: ordinal not in range(128)
[Finished in 1.6s]
print statement converts arguments to str objects. Encoding it manually will prevent to encode it with ascii:
import sys
...
if __name__ == '__main__':
url = 'http://weekly.manong.io/issues/58'
html = get_html_content(url)
print html.decode('utf-8').encode(sys.stdout.encoding, 'ignore')
Replace sys.stdout.encoding with encoding of your terminal unless it print correctly.
UPDATE
Alternatively you can use PYTHONIOENCODING environmental variable without encoding in the source code:
PYTHONIOENCODING=utf-8:ignore python program.py
If the standard output is redirected to a pipe then Python 2 fails to use your locale encoding:
⟫ python -c'print u"\u201c"' # no redirection -- works
“
⟫ python -c'print u"\u201c"' | cat
Traceback (most recent call last):
File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 0: ordinal not in range(128)
To fix it; you could specify PYTHONEIOENCODING environment variable e.g., in bash:
⟫ PYTHONIOENCODING=utf-8 python -c'print u"\u201c"' | cat
“
On Windows, you need to set the envvar using a different syntax.
If your Windows console doesn't support utf-8 (it matters only for the first command where there is no redirection) then you could try to print Unicode directly using Win32 API calls like win-unicode-console does. See windows console doesn't print or input Unicode.

Categories