Unicode characters in the Python CLI [duplicate] - python

When I try to print a Unicode string in a Windows console, I get an error .
UnicodeEncodeError: 'charmap' codec can't encode character ....
I assume this is because the Windows console does not accept Unicode-only characters. What's the best way around this?
Is there any way I can make Python automatically print a ? instead of failing in this situation?
Edit: I'm using Python 2.5.
Note: #LasseV.Karlsen answer with the checkmark is sort of outdated (from 2008). Please use the solutions/answers/suggestions below with care!!
#JFSebastian answer is more relevant as of today (6 Jan 2016).

Update: Python 3.6 implements PEP 528: Change Windows console encoding to UTF-8: the default console on Windows will now accept all Unicode characters. Internally, it uses the same Unicode API as the win-unicode-console package mentioned below. print(unicode_string) should just work now.
I get a UnicodeEncodeError: 'charmap' codec can't encode character... error.
The error means that Unicode characters that you are trying to print can't be represented using the current (chcp) console character encoding. The codepage is often 8-bit encoding such as cp437 that can represent only ~0x100 characters from ~1M Unicode characters:
>>> u"\N{EURO SIGN}".encode('cp437')
Traceback (most recent call last):
...
UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in position 0:
character maps to
I assume this is because the Windows console does not accept Unicode-only characters. What's the best way around this?
Windows console does accept Unicode characters and it can even display them (BMP only) if the corresponding font is configured. WriteConsoleW() API should be used as suggested in #Daira Hopwood's answer. It can be called transparently i.e., you don't need to and should not modify your scripts if you use win-unicode-console package:
T:\> py -m pip install win-unicode-console
T:\> py -m run your_script.py
See What's the deal with Python 3.4, Unicode, different languages and Windows?
Is there any way I can make Python
automatically print a ? instead of failing in this situation?
If it is enough to replace all unencodable characters with ? in your case then you could set PYTHONIOENCODING envvar:
T:\> set PYTHONIOENCODING=:replace
T:\> python3 -c "print(u'[\N{EURO SIGN}]')"
[?]
In Python 3.6+, the encoding specified by PYTHONIOENCODING envvar is ignored for interactive console buffers unless PYTHONLEGACYWINDOWSIOENCODING envvar is set to a non-empty string.

Note: This answer is sort of outdated (from 2008). Please use the solution below with care!!
Here is a page that details the problem and a solution (search the page for the text Wrapping sys.stdout into an instance):
PrintFails - Python Wiki
Here's a code excerpt from that page:
$ python -c 'import sys, codecs, locale; print sys.stdout.encoding; \
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout); \
line = u"\u0411\n"; print type(line), len(line); \
sys.stdout.write(line); print line'
UTF-8
<type 'unicode'> 2
Б
Б
$ python -c 'import sys, codecs, locale; print sys.stdout.encoding; \
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout); \
line = u"\u0411\n"; print type(line), len(line); \
sys.stdout.write(line); print line' | cat
None
<type 'unicode'> 2
Б
Б
There's some more information on that page, well worth a read.

Update: On Python 3.6 or later, printing Unicode strings to the console on Windows just works.
So, upgrade to recent Python and you're done. At this point I recommend using 2to3 to update your code to Python 3.x if needed, and just dropping support for Python 2.x. Note that there has been no security support for any version of Python before 3.7 (including Python 2.7) since December 2021.
If you really still need to support earlier versions of Python (including Python 2.7), you can use https://github.com/Drekin/win-unicode-console , which is based on, and uses the same APIs as the code in the answer that was previously linked here. (That link does include some information on Windows font configuration but I doubt it still applies to Windows 8 or later.)
Note: despite other plausible-sounding answers that suggest changing the code page to 65001, that did not work prior to Python 3.8. (It does kind-of work since then, but as pointed out above, you don't need to do so for Python 3.6+ anyway.) Also, changing the default encoding using sys.setdefaultencoding is (still) not a good idea.

If you're not interested in getting a reliable representation of the bad character(s) you might use something like this (working with python >= 2.6, including 3.x):
from __future__ import print_function
import sys
def safeprint(s):
try:
print(s)
except UnicodeEncodeError:
if sys.version_info >= (3,):
print(s.encode('utf8').decode(sys.stdout.encoding))
else:
print(s.encode('utf8'))
safeprint(u"\N{EM DASH}")
The bad character(s) in the string will be converted in a representation which is printable by the Windows console.

The below code will make Python output to console as UTF-8 even on Windows.
The console will display the characters well on Windows 7 but on Windows XP it will not display them well, but at least it will work and most important you will have a consistent output from your script on all platforms. You'll be able to redirect the output to a file.
Below code was tested with Python 2.6 on Windows.
#!/usr/bin/python
# -*- coding: UTF-8 -*-
import codecs, sys
reload(sys)
sys.setdefaultencoding('utf-8')
print sys.getdefaultencoding()
if sys.platform == 'win32':
try:
import win32console
except:
print "Python Win32 Extensions module is required.\n You can download it from https://sourceforge.net/projects/pywin32/ (x86 and x64 builds are available)\n"
exit(-1)
# win32console implementation of SetConsoleCP does not return a value
# CP_UTF8 = 65001
win32console.SetConsoleCP(65001)
if (win32console.GetConsoleCP() != 65001):
raise Exception ("Cannot set console codepage to 65001 (UTF-8)")
win32console.SetConsoleOutputCP(65001)
if (win32console.GetConsoleOutputCP() != 65001):
raise Exception ("Cannot set console output codepage to 65001 (UTF-8)")
#import sys, codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
sys.stderr = codecs.getwriter('utf8')(sys.stderr)
print "This is an Е乂αmp١ȅ testing Unicode support using Arabic, Latin, Cyrillic, Greek, Hebrew and CJK code points.\n"

Just enter this code in command line before executing python script:
chcp 65001 & set PYTHONIOENCODING=utf-8

Like Giampaolo Rodolà's answer, but even more dirty: I really, really intend to spend a long time (soon) understanding the whole subject of encodings and how they apply to Windoze consoles,
For the moment I just wanted sthg which would mean my program would NOT CRASH, and which I understood ... and also which didn't involve importing too many exotic modules (in particular I'm using Jython, so half the time a Python module turns out not in fact to be available).
def pr(s):
try:
print(s)
except UnicodeEncodeError:
for c in s:
try:
print( c, end='')
except UnicodeEncodeError:
print( '?', end='')
NB "pr" is shorter to type than "print" (and quite a bit shorter to type than "safeprint")...!

Kind of related on the answer by J. F. Sebastian, but more direct.
If you are having this problem when printing to the console/terminal, then do this:
>set PYTHONIOENCODING=UTF-8

For Python 2 try:
print unicode(string, 'unicode-escape')
For Python 3 try:
import os
string = "002 Could've Would've Should've"
os.system('echo ' + string)
Or try win-unicode-console:
pip install win-unicode-console
py -mrun your_script.py

TL;DR:
print(yourstring.encode('ascii','replace').decode('ascii'))
I ran into this myself, working on a Twitch chat (IRC) bot. (Python 2.7 latest)
I wanted to parse chat messages in order to respond...
msg = s.recv(1024).decode("utf-8")
but also print them safely to the console in a human-readable format:
print(msg.encode('ascii','replace').decode('ascii'))
This corrected the issue of the bot throwing UnicodeEncodeError: 'charmap' errors and replaced the unicode characters with ?.

Python 3.6 windows7: There is several way to launch a python you could use the python console (which has a python logo on it) or the windows console (it's written cmd.exe on it).
I could not print utf8 characters in the windows console. Printing utf-8 characters throw me this error:
OSError: [winError 87] The paraneter is incorrect
Exception ignored in: (_io-TextIOwrapper name='(stdout)' mode='w' ' encoding='utf8')
OSError: [WinError 87] The parameter is incorrect
After trying and failing to understand the answer above I discovered it was only a setting problem. Right click on the top of the cmd console windows, on the tab font chose lucida console.

The cause of your problem is NOT the Win console not willing to accept Unicode (as it does this since I guess Win2k by default). It is the default system encoding. Try this code and see what it gives you:
import sys
sys.getdefaultencoding()
if it says ascii, there's your cause ;-)
You have to create a file called sitecustomize.py and put it under python path (I put it under /usr/lib/python2.5/site-packages, but that is differen on Win - it is c:\python\lib\site-packages or something), with the following contents:
import sys
sys.setdefaultencoding('utf-8')
and perhaps you might want to specify the encoding in your files as well:
# -*- coding: UTF-8 -*-
import sys,time
Edit: more info can be found in excellent the Dive into Python book

Nowadays, the Windows console does not encounter this error, unless you redirect the output.
Here is an example Python script scratch_1.py:
s = "∞"
print(s)
If you run the script as follows, everything works as intended:
python scratch_1.py
∞
However, if you run the following, then you get the same error as in the question:
python scratch_1.py > temp.txt
Traceback (most recent call last):
File "C:\Users\Wok\AppData\Roaming\JetBrains\PyCharmCE2022.2\scratches\scratch_1.py", line 3, in <module>
print(s)
File "C:\Users\Wok\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode character '\u221e' in position 0: character maps to <undefined>
To solve this issue with the suggestion present in the original question, i.e. by replacing the erroneous characters with question marks ?, one can proceed as follows:
s = "∞"
try:
print(s)
except UnicodeEncodeError:
output_str = s.encode("ascii", errors="replace").decode("ascii")
print(output_str)
It is important:
to call decode(), so that the type of the output is str instead of bytes,
with the same encoding, here "ascii", to avoid the creation of mojibake.

James Sulak asked,
Is there any way I can make Python automatically print a ? instead of failing in this situation?
Other solutions recommend we attempt to modify the Windows environment or replace Python's print() function. The answer below comes closer to fulfilling Sulak's request.
Under Windows 7, Python 3.5 can be made to print Unicode without throwing a UnicodeEncodeError as follows:
In place of:
print(text)
substitute:
print(str(text).encode('utf-8'))
Instead of throwing an exception, Python now displays unprintable Unicode characters as \xNN hex codes, e.g.:
Halmalo n\xe2\x80\x99\xc3\xa9tait plus qu\xe2\x80\x99un point noir
Instead of
Halmalo n’était plus qu’un point noir
Granted, the latter is preferable ceteris paribus, but otherwise the former is completely accurate for diagnostic messages. Because it displays Unicode as literal byte values the former may also assist in diagnosing encode/decode problems.
Note: The str() call above is needed because otherwise encode() causes Python to reject a Unicode character as a tuple of numbers.

The issue is with windows default encoding being set to cp1252, and need to be set to utf-8. (check PEP)
Check default encoding using:
import locale
locale.getpreferredencoding()
You can override locale settings
import os
if os.name == "nt":
import _locale
_locale._gdl_bak = _locale._getdefaultlocale
_locale._getdefaultlocale = (lambda *args: (_locale._gdl_bak()[0], 'utf8'))
referenced code from stack link

Related

Where does Python get the preferred encoding from? [duplicate]

When I try to print a Unicode string in a Windows console, I get an error .
UnicodeEncodeError: 'charmap' codec can't encode character ....
I assume this is because the Windows console does not accept Unicode-only characters. What's the best way around this?
Is there any way I can make Python automatically print a ? instead of failing in this situation?
Edit: I'm using Python 2.5.
Note: #LasseV.Karlsen answer with the checkmark is sort of outdated (from 2008). Please use the solutions/answers/suggestions below with care!!
#JFSebastian answer is more relevant as of today (6 Jan 2016).
Update: Python 3.6 implements PEP 528: Change Windows console encoding to UTF-8: the default console on Windows will now accept all Unicode characters. Internally, it uses the same Unicode API as the win-unicode-console package mentioned below. print(unicode_string) should just work now.
I get a UnicodeEncodeError: 'charmap' codec can't encode character... error.
The error means that Unicode characters that you are trying to print can't be represented using the current (chcp) console character encoding. The codepage is often 8-bit encoding such as cp437 that can represent only ~0x100 characters from ~1M Unicode characters:
>>> u"\N{EURO SIGN}".encode('cp437')
Traceback (most recent call last):
...
UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in position 0:
character maps to
I assume this is because the Windows console does not accept Unicode-only characters. What's the best way around this?
Windows console does accept Unicode characters and it can even display them (BMP only) if the corresponding font is configured. WriteConsoleW() API should be used as suggested in #Daira Hopwood's answer. It can be called transparently i.e., you don't need to and should not modify your scripts if you use win-unicode-console package:
T:\> py -m pip install win-unicode-console
T:\> py -m run your_script.py
See What's the deal with Python 3.4, Unicode, different languages and Windows?
Is there any way I can make Python
automatically print a ? instead of failing in this situation?
If it is enough to replace all unencodable characters with ? in your case then you could set PYTHONIOENCODING envvar:
T:\> set PYTHONIOENCODING=:replace
T:\> python3 -c "print(u'[\N{EURO SIGN}]')"
[?]
In Python 3.6+, the encoding specified by PYTHONIOENCODING envvar is ignored for interactive console buffers unless PYTHONLEGACYWINDOWSIOENCODING envvar is set to a non-empty string.
Note: This answer is sort of outdated (from 2008). Please use the solution below with care!!
Here is a page that details the problem and a solution (search the page for the text Wrapping sys.stdout into an instance):
PrintFails - Python Wiki
Here's a code excerpt from that page:
$ python -c 'import sys, codecs, locale; print sys.stdout.encoding; \
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout); \
line = u"\u0411\n"; print type(line), len(line); \
sys.stdout.write(line); print line'
UTF-8
<type 'unicode'> 2
Б
Б
$ python -c 'import sys, codecs, locale; print sys.stdout.encoding; \
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout); \
line = u"\u0411\n"; print type(line), len(line); \
sys.stdout.write(line); print line' | cat
None
<type 'unicode'> 2
Б
Б
There's some more information on that page, well worth a read.
Update: On Python 3.6 or later, printing Unicode strings to the console on Windows just works.
So, upgrade to recent Python and you're done. At this point I recommend using 2to3 to update your code to Python 3.x if needed, and just dropping support for Python 2.x. Note that there has been no security support for any version of Python before 3.7 (including Python 2.7) since December 2021.
If you really still need to support earlier versions of Python (including Python 2.7), you can use https://github.com/Drekin/win-unicode-console , which is based on, and uses the same APIs as the code in the answer that was previously linked here. (That link does include some information on Windows font configuration but I doubt it still applies to Windows 8 or later.)
Note: despite other plausible-sounding answers that suggest changing the code page to 65001, that did not work prior to Python 3.8. (It does kind-of work since then, but as pointed out above, you don't need to do so for Python 3.6+ anyway.) Also, changing the default encoding using sys.setdefaultencoding is (still) not a good idea.
If you're not interested in getting a reliable representation of the bad character(s) you might use something like this (working with python >= 2.6, including 3.x):
from __future__ import print_function
import sys
def safeprint(s):
try:
print(s)
except UnicodeEncodeError:
if sys.version_info >= (3,):
print(s.encode('utf8').decode(sys.stdout.encoding))
else:
print(s.encode('utf8'))
safeprint(u"\N{EM DASH}")
The bad character(s) in the string will be converted in a representation which is printable by the Windows console.
The below code will make Python output to console as UTF-8 even on Windows.
The console will display the characters well on Windows 7 but on Windows XP it will not display them well, but at least it will work and most important you will have a consistent output from your script on all platforms. You'll be able to redirect the output to a file.
Below code was tested with Python 2.6 on Windows.
#!/usr/bin/python
# -*- coding: UTF-8 -*-
import codecs, sys
reload(sys)
sys.setdefaultencoding('utf-8')
print sys.getdefaultencoding()
if sys.platform == 'win32':
try:
import win32console
except:
print "Python Win32 Extensions module is required.\n You can download it from https://sourceforge.net/projects/pywin32/ (x86 and x64 builds are available)\n"
exit(-1)
# win32console implementation of SetConsoleCP does not return a value
# CP_UTF8 = 65001
win32console.SetConsoleCP(65001)
if (win32console.GetConsoleCP() != 65001):
raise Exception ("Cannot set console codepage to 65001 (UTF-8)")
win32console.SetConsoleOutputCP(65001)
if (win32console.GetConsoleOutputCP() != 65001):
raise Exception ("Cannot set console output codepage to 65001 (UTF-8)")
#import sys, codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
sys.stderr = codecs.getwriter('utf8')(sys.stderr)
print "This is an Е乂αmp١ȅ testing Unicode support using Arabic, Latin, Cyrillic, Greek, Hebrew and CJK code points.\n"
Just enter this code in command line before executing python script:
chcp 65001 & set PYTHONIOENCODING=utf-8
Like Giampaolo Rodolà's answer, but even more dirty: I really, really intend to spend a long time (soon) understanding the whole subject of encodings and how they apply to Windoze consoles,
For the moment I just wanted sthg which would mean my program would NOT CRASH, and which I understood ... and also which didn't involve importing too many exotic modules (in particular I'm using Jython, so half the time a Python module turns out not in fact to be available).
def pr(s):
try:
print(s)
except UnicodeEncodeError:
for c in s:
try:
print( c, end='')
except UnicodeEncodeError:
print( '?', end='')
NB "pr" is shorter to type than "print" (and quite a bit shorter to type than "safeprint")...!
Kind of related on the answer by J. F. Sebastian, but more direct.
If you are having this problem when printing to the console/terminal, then do this:
>set PYTHONIOENCODING=UTF-8
For Python 2 try:
print unicode(string, 'unicode-escape')
For Python 3 try:
import os
string = "002 Could've Would've Should've"
os.system('echo ' + string)
Or try win-unicode-console:
pip install win-unicode-console
py -mrun your_script.py
TL;DR:
print(yourstring.encode('ascii','replace').decode('ascii'))
I ran into this myself, working on a Twitch chat (IRC) bot. (Python 2.7 latest)
I wanted to parse chat messages in order to respond...
msg = s.recv(1024).decode("utf-8")
but also print them safely to the console in a human-readable format:
print(msg.encode('ascii','replace').decode('ascii'))
This corrected the issue of the bot throwing UnicodeEncodeError: 'charmap' errors and replaced the unicode characters with ?.
Python 3.6 windows7: There is several way to launch a python you could use the python console (which has a python logo on it) or the windows console (it's written cmd.exe on it).
I could not print utf8 characters in the windows console. Printing utf-8 characters throw me this error:
OSError: [winError 87] The paraneter is incorrect
Exception ignored in: (_io-TextIOwrapper name='(stdout)' mode='w' ' encoding='utf8')
OSError: [WinError 87] The parameter is incorrect
After trying and failing to understand the answer above I discovered it was only a setting problem. Right click on the top of the cmd console windows, on the tab font chose lucida console.
The cause of your problem is NOT the Win console not willing to accept Unicode (as it does this since I guess Win2k by default). It is the default system encoding. Try this code and see what it gives you:
import sys
sys.getdefaultencoding()
if it says ascii, there's your cause ;-)
You have to create a file called sitecustomize.py and put it under python path (I put it under /usr/lib/python2.5/site-packages, but that is differen on Win - it is c:\python\lib\site-packages or something), with the following contents:
import sys
sys.setdefaultencoding('utf-8')
and perhaps you might want to specify the encoding in your files as well:
# -*- coding: UTF-8 -*-
import sys,time
Edit: more info can be found in excellent the Dive into Python book
Nowadays, the Windows console does not encounter this error, unless you redirect the output.
Here is an example Python script scratch_1.py:
s = "∞"
print(s)
If you run the script as follows, everything works as intended:
python scratch_1.py
∞
However, if you run the following, then you get the same error as in the question:
python scratch_1.py > temp.txt
Traceback (most recent call last):
File "C:\Users\Wok\AppData\Roaming\JetBrains\PyCharmCE2022.2\scratches\scratch_1.py", line 3, in <module>
print(s)
File "C:\Users\Wok\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode character '\u221e' in position 0: character maps to <undefined>
To solve this issue with the suggestion present in the original question, i.e. by replacing the erroneous characters with question marks ?, one can proceed as follows:
s = "∞"
try:
print(s)
except UnicodeEncodeError:
output_str = s.encode("ascii", errors="replace").decode("ascii")
print(output_str)
It is important:
to call decode(), so that the type of the output is str instead of bytes,
with the same encoding, here "ascii", to avoid the creation of mojibake.
James Sulak asked,
Is there any way I can make Python automatically print a ? instead of failing in this situation?
Other solutions recommend we attempt to modify the Windows environment or replace Python's print() function. The answer below comes closer to fulfilling Sulak's request.
Under Windows 7, Python 3.5 can be made to print Unicode without throwing a UnicodeEncodeError as follows:
In place of:
print(text)
substitute:
print(str(text).encode('utf-8'))
Instead of throwing an exception, Python now displays unprintable Unicode characters as \xNN hex codes, e.g.:
Halmalo n\xe2\x80\x99\xc3\xa9tait plus qu\xe2\x80\x99un point noir
Instead of
Halmalo n’était plus qu’un point noir
Granted, the latter is preferable ceteris paribus, but otherwise the former is completely accurate for diagnostic messages. Because it displays Unicode as literal byte values the former may also assist in diagnosing encode/decode problems.
Note: The str() call above is needed because otherwise encode() causes Python to reject a Unicode character as a tuple of numbers.
The issue is with windows default encoding being set to cp1252, and need to be set to utf-8. (check PEP)
Check default encoding using:
import locale
locale.getpreferredencoding()
You can override locale settings
import os
if os.name == "nt":
import _locale
_locale._gdl_bak = _locale._getdefaultlocale
_locale._getdefaultlocale = (lambda *args: (_locale._gdl_bak()[0], 'utf8'))
referenced code from stack link

Pass unicode character to from os.walk to print function [duplicate]

When I try to print a Unicode string in a Windows console, I get an error .
UnicodeEncodeError: 'charmap' codec can't encode character ....
I assume this is because the Windows console does not accept Unicode-only characters. What's the best way around this?
Is there any way I can make Python automatically print a ? instead of failing in this situation?
Edit: I'm using Python 2.5.
Note: #LasseV.Karlsen answer with the checkmark is sort of outdated (from 2008). Please use the solutions/answers/suggestions below with care!!
#JFSebastian answer is more relevant as of today (6 Jan 2016).
Update: Python 3.6 implements PEP 528: Change Windows console encoding to UTF-8: the default console on Windows will now accept all Unicode characters. Internally, it uses the same Unicode API as the win-unicode-console package mentioned below. print(unicode_string) should just work now.
I get a UnicodeEncodeError: 'charmap' codec can't encode character... error.
The error means that Unicode characters that you are trying to print can't be represented using the current (chcp) console character encoding. The codepage is often 8-bit encoding such as cp437 that can represent only ~0x100 characters from ~1M Unicode characters:
>>> u"\N{EURO SIGN}".encode('cp437')
Traceback (most recent call last):
...
UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in position 0:
character maps to
I assume this is because the Windows console does not accept Unicode-only characters. What's the best way around this?
Windows console does accept Unicode characters and it can even display them (BMP only) if the corresponding font is configured. WriteConsoleW() API should be used as suggested in #Daira Hopwood's answer. It can be called transparently i.e., you don't need to and should not modify your scripts if you use win-unicode-console package:
T:\> py -m pip install win-unicode-console
T:\> py -m run your_script.py
See What's the deal with Python 3.4, Unicode, different languages and Windows?
Is there any way I can make Python
automatically print a ? instead of failing in this situation?
If it is enough to replace all unencodable characters with ? in your case then you could set PYTHONIOENCODING envvar:
T:\> set PYTHONIOENCODING=:replace
T:\> python3 -c "print(u'[\N{EURO SIGN}]')"
[?]
In Python 3.6+, the encoding specified by PYTHONIOENCODING envvar is ignored for interactive console buffers unless PYTHONLEGACYWINDOWSIOENCODING envvar is set to a non-empty string.
Note: This answer is sort of outdated (from 2008). Please use the solution below with care!!
Here is a page that details the problem and a solution (search the page for the text Wrapping sys.stdout into an instance):
PrintFails - Python Wiki
Here's a code excerpt from that page:
$ python -c 'import sys, codecs, locale; print sys.stdout.encoding; \
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout); \
line = u"\u0411\n"; print type(line), len(line); \
sys.stdout.write(line); print line'
UTF-8
<type 'unicode'> 2
Б
Б
$ python -c 'import sys, codecs, locale; print sys.stdout.encoding; \
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout); \
line = u"\u0411\n"; print type(line), len(line); \
sys.stdout.write(line); print line' | cat
None
<type 'unicode'> 2
Б
Б
There's some more information on that page, well worth a read.
Update: On Python 3.6 or later, printing Unicode strings to the console on Windows just works.
So, upgrade to recent Python and you're done. At this point I recommend using 2to3 to update your code to Python 3.x if needed, and just dropping support for Python 2.x. Note that there has been no security support for any version of Python before 3.7 (including Python 2.7) since December 2021.
If you really still need to support earlier versions of Python (including Python 2.7), you can use https://github.com/Drekin/win-unicode-console , which is based on, and uses the same APIs as the code in the answer that was previously linked here. (That link does include some information on Windows font configuration but I doubt it still applies to Windows 8 or later.)
Note: despite other plausible-sounding answers that suggest changing the code page to 65001, that did not work prior to Python 3.8. (It does kind-of work since then, but as pointed out above, you don't need to do so for Python 3.6+ anyway.) Also, changing the default encoding using sys.setdefaultencoding is (still) not a good idea.
If you're not interested in getting a reliable representation of the bad character(s) you might use something like this (working with python >= 2.6, including 3.x):
from __future__ import print_function
import sys
def safeprint(s):
try:
print(s)
except UnicodeEncodeError:
if sys.version_info >= (3,):
print(s.encode('utf8').decode(sys.stdout.encoding))
else:
print(s.encode('utf8'))
safeprint(u"\N{EM DASH}")
The bad character(s) in the string will be converted in a representation which is printable by the Windows console.
The below code will make Python output to console as UTF-8 even on Windows.
The console will display the characters well on Windows 7 but on Windows XP it will not display them well, but at least it will work and most important you will have a consistent output from your script on all platforms. You'll be able to redirect the output to a file.
Below code was tested with Python 2.6 on Windows.
#!/usr/bin/python
# -*- coding: UTF-8 -*-
import codecs, sys
reload(sys)
sys.setdefaultencoding('utf-8')
print sys.getdefaultencoding()
if sys.platform == 'win32':
try:
import win32console
except:
print "Python Win32 Extensions module is required.\n You can download it from https://sourceforge.net/projects/pywin32/ (x86 and x64 builds are available)\n"
exit(-1)
# win32console implementation of SetConsoleCP does not return a value
# CP_UTF8 = 65001
win32console.SetConsoleCP(65001)
if (win32console.GetConsoleCP() != 65001):
raise Exception ("Cannot set console codepage to 65001 (UTF-8)")
win32console.SetConsoleOutputCP(65001)
if (win32console.GetConsoleOutputCP() != 65001):
raise Exception ("Cannot set console output codepage to 65001 (UTF-8)")
#import sys, codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
sys.stderr = codecs.getwriter('utf8')(sys.stderr)
print "This is an Е乂αmp١ȅ testing Unicode support using Arabic, Latin, Cyrillic, Greek, Hebrew and CJK code points.\n"
Just enter this code in command line before executing python script:
chcp 65001 & set PYTHONIOENCODING=utf-8
Like Giampaolo Rodolà's answer, but even more dirty: I really, really intend to spend a long time (soon) understanding the whole subject of encodings and how they apply to Windoze consoles,
For the moment I just wanted sthg which would mean my program would NOT CRASH, and which I understood ... and also which didn't involve importing too many exotic modules (in particular I'm using Jython, so half the time a Python module turns out not in fact to be available).
def pr(s):
try:
print(s)
except UnicodeEncodeError:
for c in s:
try:
print( c, end='')
except UnicodeEncodeError:
print( '?', end='')
NB "pr" is shorter to type than "print" (and quite a bit shorter to type than "safeprint")...!
Kind of related on the answer by J. F. Sebastian, but more direct.
If you are having this problem when printing to the console/terminal, then do this:
>set PYTHONIOENCODING=UTF-8
For Python 2 try:
print unicode(string, 'unicode-escape')
For Python 3 try:
import os
string = "002 Could've Would've Should've"
os.system('echo ' + string)
Or try win-unicode-console:
pip install win-unicode-console
py -mrun your_script.py
TL;DR:
print(yourstring.encode('ascii','replace').decode('ascii'))
I ran into this myself, working on a Twitch chat (IRC) bot. (Python 2.7 latest)
I wanted to parse chat messages in order to respond...
msg = s.recv(1024).decode("utf-8")
but also print them safely to the console in a human-readable format:
print(msg.encode('ascii','replace').decode('ascii'))
This corrected the issue of the bot throwing UnicodeEncodeError: 'charmap' errors and replaced the unicode characters with ?.
Python 3.6 windows7: There is several way to launch a python you could use the python console (which has a python logo on it) or the windows console (it's written cmd.exe on it).
I could not print utf8 characters in the windows console. Printing utf-8 characters throw me this error:
OSError: [winError 87] The paraneter is incorrect
Exception ignored in: (_io-TextIOwrapper name='(stdout)' mode='w' ' encoding='utf8')
OSError: [WinError 87] The parameter is incorrect
After trying and failing to understand the answer above I discovered it was only a setting problem. Right click on the top of the cmd console windows, on the tab font chose lucida console.
The cause of your problem is NOT the Win console not willing to accept Unicode (as it does this since I guess Win2k by default). It is the default system encoding. Try this code and see what it gives you:
import sys
sys.getdefaultencoding()
if it says ascii, there's your cause ;-)
You have to create a file called sitecustomize.py and put it under python path (I put it under /usr/lib/python2.5/site-packages, but that is differen on Win - it is c:\python\lib\site-packages or something), with the following contents:
import sys
sys.setdefaultencoding('utf-8')
and perhaps you might want to specify the encoding in your files as well:
# -*- coding: UTF-8 -*-
import sys,time
Edit: more info can be found in excellent the Dive into Python book
Nowadays, the Windows console does not encounter this error, unless you redirect the output.
Here is an example Python script scratch_1.py:
s = "∞"
print(s)
If you run the script as follows, everything works as intended:
python scratch_1.py
∞
However, if you run the following, then you get the same error as in the question:
python scratch_1.py > temp.txt
Traceback (most recent call last):
File "C:\Users\Wok\AppData\Roaming\JetBrains\PyCharmCE2022.2\scratches\scratch_1.py", line 3, in <module>
print(s)
File "C:\Users\Wok\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode character '\u221e' in position 0: character maps to <undefined>
To solve this issue with the suggestion present in the original question, i.e. by replacing the erroneous characters with question marks ?, one can proceed as follows:
s = "∞"
try:
print(s)
except UnicodeEncodeError:
output_str = s.encode("ascii", errors="replace").decode("ascii")
print(output_str)
It is important:
to call decode(), so that the type of the output is str instead of bytes,
with the same encoding, here "ascii", to avoid the creation of mojibake.
James Sulak asked,
Is there any way I can make Python automatically print a ? instead of failing in this situation?
Other solutions recommend we attempt to modify the Windows environment or replace Python's print() function. The answer below comes closer to fulfilling Sulak's request.
Under Windows 7, Python 3.5 can be made to print Unicode without throwing a UnicodeEncodeError as follows:
In place of:
print(text)
substitute:
print(str(text).encode('utf-8'))
Instead of throwing an exception, Python now displays unprintable Unicode characters as \xNN hex codes, e.g.:
Halmalo n\xe2\x80\x99\xc3\xa9tait plus qu\xe2\x80\x99un point noir
Instead of
Halmalo n’était plus qu’un point noir
Granted, the latter is preferable ceteris paribus, but otherwise the former is completely accurate for diagnostic messages. Because it displays Unicode as literal byte values the former may also assist in diagnosing encode/decode problems.
Note: The str() call above is needed because otherwise encode() causes Python to reject a Unicode character as a tuple of numbers.
The issue is with windows default encoding being set to cp1252, and need to be set to utf-8. (check PEP)
Check default encoding using:
import locale
locale.getpreferredencoding()
You can override locale settings
import os
if os.name == "nt":
import _locale
_locale._gdl_bak = _locale._getdefaultlocale
_locale._getdefaultlocale = (lambda *args: (_locale._gdl_bak()[0], 'utf8'))
referenced code from stack link

Printing unicode in python [duplicate]

When I try to print a Unicode string in a Windows console, I get an error .
UnicodeEncodeError: 'charmap' codec can't encode character ....
I assume this is because the Windows console does not accept Unicode-only characters. What's the best way around this?
Is there any way I can make Python automatically print a ? instead of failing in this situation?
Edit: I'm using Python 2.5.
Note: #LasseV.Karlsen answer with the checkmark is sort of outdated (from 2008). Please use the solutions/answers/suggestions below with care!!
#JFSebastian answer is more relevant as of today (6 Jan 2016).
Update: Python 3.6 implements PEP 528: Change Windows console encoding to UTF-8: the default console on Windows will now accept all Unicode characters. Internally, it uses the same Unicode API as the win-unicode-console package mentioned below. print(unicode_string) should just work now.
I get a UnicodeEncodeError: 'charmap' codec can't encode character... error.
The error means that Unicode characters that you are trying to print can't be represented using the current (chcp) console character encoding. The codepage is often 8-bit encoding such as cp437 that can represent only ~0x100 characters from ~1M Unicode characters:
>>> u"\N{EURO SIGN}".encode('cp437')
Traceback (most recent call last):
...
UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in position 0:
character maps to
I assume this is because the Windows console does not accept Unicode-only characters. What's the best way around this?
Windows console does accept Unicode characters and it can even display them (BMP only) if the corresponding font is configured. WriteConsoleW() API should be used as suggested in #Daira Hopwood's answer. It can be called transparently i.e., you don't need to and should not modify your scripts if you use win-unicode-console package:
T:\> py -m pip install win-unicode-console
T:\> py -m run your_script.py
See What's the deal with Python 3.4, Unicode, different languages and Windows?
Is there any way I can make Python
automatically print a ? instead of failing in this situation?
If it is enough to replace all unencodable characters with ? in your case then you could set PYTHONIOENCODING envvar:
T:\> set PYTHONIOENCODING=:replace
T:\> python3 -c "print(u'[\N{EURO SIGN}]')"
[?]
In Python 3.6+, the encoding specified by PYTHONIOENCODING envvar is ignored for interactive console buffers unless PYTHONLEGACYWINDOWSIOENCODING envvar is set to a non-empty string.
Note: This answer is sort of outdated (from 2008). Please use the solution below with care!!
Here is a page that details the problem and a solution (search the page for the text Wrapping sys.stdout into an instance):
PrintFails - Python Wiki
Here's a code excerpt from that page:
$ python -c 'import sys, codecs, locale; print sys.stdout.encoding; \
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout); \
line = u"\u0411\n"; print type(line), len(line); \
sys.stdout.write(line); print line'
UTF-8
<type 'unicode'> 2
Б
Б
$ python -c 'import sys, codecs, locale; print sys.stdout.encoding; \
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout); \
line = u"\u0411\n"; print type(line), len(line); \
sys.stdout.write(line); print line' | cat
None
<type 'unicode'> 2
Б
Б
There's some more information on that page, well worth a read.
Update: On Python 3.6 or later, printing Unicode strings to the console on Windows just works.
So, upgrade to recent Python and you're done. At this point I recommend using 2to3 to update your code to Python 3.x if needed, and just dropping support for Python 2.x. Note that there has been no security support for any version of Python before 3.7 (including Python 2.7) since December 2021.
If you really still need to support earlier versions of Python (including Python 2.7), you can use https://github.com/Drekin/win-unicode-console , which is based on, and uses the same APIs as the code in the answer that was previously linked here. (That link does include some information on Windows font configuration but I doubt it still applies to Windows 8 or later.)
Note: despite other plausible-sounding answers that suggest changing the code page to 65001, that did not work prior to Python 3.8. (It does kind-of work since then, but as pointed out above, you don't need to do so for Python 3.6+ anyway.) Also, changing the default encoding using sys.setdefaultencoding is (still) not a good idea.
If you're not interested in getting a reliable representation of the bad character(s) you might use something like this (working with python >= 2.6, including 3.x):
from __future__ import print_function
import sys
def safeprint(s):
try:
print(s)
except UnicodeEncodeError:
if sys.version_info >= (3,):
print(s.encode('utf8').decode(sys.stdout.encoding))
else:
print(s.encode('utf8'))
safeprint(u"\N{EM DASH}")
The bad character(s) in the string will be converted in a representation which is printable by the Windows console.
The below code will make Python output to console as UTF-8 even on Windows.
The console will display the characters well on Windows 7 but on Windows XP it will not display them well, but at least it will work and most important you will have a consistent output from your script on all platforms. You'll be able to redirect the output to a file.
Below code was tested with Python 2.6 on Windows.
#!/usr/bin/python
# -*- coding: UTF-8 -*-
import codecs, sys
reload(sys)
sys.setdefaultencoding('utf-8')
print sys.getdefaultencoding()
if sys.platform == 'win32':
try:
import win32console
except:
print "Python Win32 Extensions module is required.\n You can download it from https://sourceforge.net/projects/pywin32/ (x86 and x64 builds are available)\n"
exit(-1)
# win32console implementation of SetConsoleCP does not return a value
# CP_UTF8 = 65001
win32console.SetConsoleCP(65001)
if (win32console.GetConsoleCP() != 65001):
raise Exception ("Cannot set console codepage to 65001 (UTF-8)")
win32console.SetConsoleOutputCP(65001)
if (win32console.GetConsoleOutputCP() != 65001):
raise Exception ("Cannot set console output codepage to 65001 (UTF-8)")
#import sys, codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
sys.stderr = codecs.getwriter('utf8')(sys.stderr)
print "This is an Е乂αmp١ȅ testing Unicode support using Arabic, Latin, Cyrillic, Greek, Hebrew and CJK code points.\n"
Just enter this code in command line before executing python script:
chcp 65001 & set PYTHONIOENCODING=utf-8
Like Giampaolo Rodolà's answer, but even more dirty: I really, really intend to spend a long time (soon) understanding the whole subject of encodings and how they apply to Windoze consoles,
For the moment I just wanted sthg which would mean my program would NOT CRASH, and which I understood ... and also which didn't involve importing too many exotic modules (in particular I'm using Jython, so half the time a Python module turns out not in fact to be available).
def pr(s):
try:
print(s)
except UnicodeEncodeError:
for c in s:
try:
print( c, end='')
except UnicodeEncodeError:
print( '?', end='')
NB "pr" is shorter to type than "print" (and quite a bit shorter to type than "safeprint")...!
Kind of related on the answer by J. F. Sebastian, but more direct.
If you are having this problem when printing to the console/terminal, then do this:
>set PYTHONIOENCODING=UTF-8
For Python 2 try:
print unicode(string, 'unicode-escape')
For Python 3 try:
import os
string = "002 Could've Would've Should've"
os.system('echo ' + string)
Or try win-unicode-console:
pip install win-unicode-console
py -mrun your_script.py
TL;DR:
print(yourstring.encode('ascii','replace').decode('ascii'))
I ran into this myself, working on a Twitch chat (IRC) bot. (Python 2.7 latest)
I wanted to parse chat messages in order to respond...
msg = s.recv(1024).decode("utf-8")
but also print them safely to the console in a human-readable format:
print(msg.encode('ascii','replace').decode('ascii'))
This corrected the issue of the bot throwing UnicodeEncodeError: 'charmap' errors and replaced the unicode characters with ?.
Python 3.6 windows7: There is several way to launch a python you could use the python console (which has a python logo on it) or the windows console (it's written cmd.exe on it).
I could not print utf8 characters in the windows console. Printing utf-8 characters throw me this error:
OSError: [winError 87] The paraneter is incorrect
Exception ignored in: (_io-TextIOwrapper name='(stdout)' mode='w' ' encoding='utf8')
OSError: [WinError 87] The parameter is incorrect
After trying and failing to understand the answer above I discovered it was only a setting problem. Right click on the top of the cmd console windows, on the tab font chose lucida console.
The cause of your problem is NOT the Win console not willing to accept Unicode (as it does this since I guess Win2k by default). It is the default system encoding. Try this code and see what it gives you:
import sys
sys.getdefaultencoding()
if it says ascii, there's your cause ;-)
You have to create a file called sitecustomize.py and put it under python path (I put it under /usr/lib/python2.5/site-packages, but that is differen on Win - it is c:\python\lib\site-packages or something), with the following contents:
import sys
sys.setdefaultencoding('utf-8')
and perhaps you might want to specify the encoding in your files as well:
# -*- coding: UTF-8 -*-
import sys,time
Edit: more info can be found in excellent the Dive into Python book
Nowadays, the Windows console does not encounter this error, unless you redirect the output.
Here is an example Python script scratch_1.py:
s = "∞"
print(s)
If you run the script as follows, everything works as intended:
python scratch_1.py
∞
However, if you run the following, then you get the same error as in the question:
python scratch_1.py > temp.txt
Traceback (most recent call last):
File "C:\Users\Wok\AppData\Roaming\JetBrains\PyCharmCE2022.2\scratches\scratch_1.py", line 3, in <module>
print(s)
File "C:\Users\Wok\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode character '\u221e' in position 0: character maps to <undefined>
To solve this issue with the suggestion present in the original question, i.e. by replacing the erroneous characters with question marks ?, one can proceed as follows:
s = "∞"
try:
print(s)
except UnicodeEncodeError:
output_str = s.encode("ascii", errors="replace").decode("ascii")
print(output_str)
It is important:
to call decode(), so that the type of the output is str instead of bytes,
with the same encoding, here "ascii", to avoid the creation of mojibake.
James Sulak asked,
Is there any way I can make Python automatically print a ? instead of failing in this situation?
Other solutions recommend we attempt to modify the Windows environment or replace Python's print() function. The answer below comes closer to fulfilling Sulak's request.
Under Windows 7, Python 3.5 can be made to print Unicode without throwing a UnicodeEncodeError as follows:
In place of:
print(text)
substitute:
print(str(text).encode('utf-8'))
Instead of throwing an exception, Python now displays unprintable Unicode characters as \xNN hex codes, e.g.:
Halmalo n\xe2\x80\x99\xc3\xa9tait plus qu\xe2\x80\x99un point noir
Instead of
Halmalo n’était plus qu’un point noir
Granted, the latter is preferable ceteris paribus, but otherwise the former is completely accurate for diagnostic messages. Because it displays Unicode as literal byte values the former may also assist in diagnosing encode/decode problems.
Note: The str() call above is needed because otherwise encode() causes Python to reject a Unicode character as a tuple of numbers.
The issue is with windows default encoding being set to cp1252, and need to be set to utf-8. (check PEP)
Check default encoding using:
import locale
locale.getpreferredencoding()
You can override locale settings
import os
if os.name == "nt":
import _locale
_locale._gdl_bak = _locale._getdefaultlocale
_locale._getdefaultlocale = (lambda *args: (_locale._gdl_bak()[0], 'utf8'))
referenced code from stack link

Convert unicode escape sequence into chinease characters string [duplicate]

When I try to print a Unicode string in a Windows console, I get an error .
UnicodeEncodeError: 'charmap' codec can't encode character ....
I assume this is because the Windows console does not accept Unicode-only characters. What's the best way around this?
Is there any way I can make Python automatically print a ? instead of failing in this situation?
Edit: I'm using Python 2.5.
Note: #LasseV.Karlsen answer with the checkmark is sort of outdated (from 2008). Please use the solutions/answers/suggestions below with care!!
#JFSebastian answer is more relevant as of today (6 Jan 2016).
Update: Python 3.6 implements PEP 528: Change Windows console encoding to UTF-8: the default console on Windows will now accept all Unicode characters. Internally, it uses the same Unicode API as the win-unicode-console package mentioned below. print(unicode_string) should just work now.
I get a UnicodeEncodeError: 'charmap' codec can't encode character... error.
The error means that Unicode characters that you are trying to print can't be represented using the current (chcp) console character encoding. The codepage is often 8-bit encoding such as cp437 that can represent only ~0x100 characters from ~1M Unicode characters:
>>> u"\N{EURO SIGN}".encode('cp437')
Traceback (most recent call last):
...
UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in position 0:
character maps to
I assume this is because the Windows console does not accept Unicode-only characters. What's the best way around this?
Windows console does accept Unicode characters and it can even display them (BMP only) if the corresponding font is configured. WriteConsoleW() API should be used as suggested in #Daira Hopwood's answer. It can be called transparently i.e., you don't need to and should not modify your scripts if you use win-unicode-console package:
T:\> py -m pip install win-unicode-console
T:\> py -m run your_script.py
See What's the deal with Python 3.4, Unicode, different languages and Windows?
Is there any way I can make Python
automatically print a ? instead of failing in this situation?
If it is enough to replace all unencodable characters with ? in your case then you could set PYTHONIOENCODING envvar:
T:\> set PYTHONIOENCODING=:replace
T:\> python3 -c "print(u'[\N{EURO SIGN}]')"
[?]
In Python 3.6+, the encoding specified by PYTHONIOENCODING envvar is ignored for interactive console buffers unless PYTHONLEGACYWINDOWSIOENCODING envvar is set to a non-empty string.
Note: This answer is sort of outdated (from 2008). Please use the solution below with care!!
Here is a page that details the problem and a solution (search the page for the text Wrapping sys.stdout into an instance):
PrintFails - Python Wiki
Here's a code excerpt from that page:
$ python -c 'import sys, codecs, locale; print sys.stdout.encoding; \
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout); \
line = u"\u0411\n"; print type(line), len(line); \
sys.stdout.write(line); print line'
UTF-8
<type 'unicode'> 2
Б
Б
$ python -c 'import sys, codecs, locale; print sys.stdout.encoding; \
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout); \
line = u"\u0411\n"; print type(line), len(line); \
sys.stdout.write(line); print line' | cat
None
<type 'unicode'> 2
Б
Б
There's some more information on that page, well worth a read.
Update: On Python 3.6 or later, printing Unicode strings to the console on Windows just works.
So, upgrade to recent Python and you're done. At this point I recommend using 2to3 to update your code to Python 3.x if needed, and just dropping support for Python 2.x. Note that there has been no security support for any version of Python before 3.7 (including Python 2.7) since December 2021.
If you really still need to support earlier versions of Python (including Python 2.7), you can use https://github.com/Drekin/win-unicode-console , which is based on, and uses the same APIs as the code in the answer that was previously linked here. (That link does include some information on Windows font configuration but I doubt it still applies to Windows 8 or later.)
Note: despite other plausible-sounding answers that suggest changing the code page to 65001, that did not work prior to Python 3.8. (It does kind-of work since then, but as pointed out above, you don't need to do so for Python 3.6+ anyway.) Also, changing the default encoding using sys.setdefaultencoding is (still) not a good idea.
If you're not interested in getting a reliable representation of the bad character(s) you might use something like this (working with python >= 2.6, including 3.x):
from __future__ import print_function
import sys
def safeprint(s):
try:
print(s)
except UnicodeEncodeError:
if sys.version_info >= (3,):
print(s.encode('utf8').decode(sys.stdout.encoding))
else:
print(s.encode('utf8'))
safeprint(u"\N{EM DASH}")
The bad character(s) in the string will be converted in a representation which is printable by the Windows console.
The below code will make Python output to console as UTF-8 even on Windows.
The console will display the characters well on Windows 7 but on Windows XP it will not display them well, but at least it will work and most important you will have a consistent output from your script on all platforms. You'll be able to redirect the output to a file.
Below code was tested with Python 2.6 on Windows.
#!/usr/bin/python
# -*- coding: UTF-8 -*-
import codecs, sys
reload(sys)
sys.setdefaultencoding('utf-8')
print sys.getdefaultencoding()
if sys.platform == 'win32':
try:
import win32console
except:
print "Python Win32 Extensions module is required.\n You can download it from https://sourceforge.net/projects/pywin32/ (x86 and x64 builds are available)\n"
exit(-1)
# win32console implementation of SetConsoleCP does not return a value
# CP_UTF8 = 65001
win32console.SetConsoleCP(65001)
if (win32console.GetConsoleCP() != 65001):
raise Exception ("Cannot set console codepage to 65001 (UTF-8)")
win32console.SetConsoleOutputCP(65001)
if (win32console.GetConsoleOutputCP() != 65001):
raise Exception ("Cannot set console output codepage to 65001 (UTF-8)")
#import sys, codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
sys.stderr = codecs.getwriter('utf8')(sys.stderr)
print "This is an Е乂αmp١ȅ testing Unicode support using Arabic, Latin, Cyrillic, Greek, Hebrew and CJK code points.\n"
Just enter this code in command line before executing python script:
chcp 65001 & set PYTHONIOENCODING=utf-8
Like Giampaolo Rodolà's answer, but even more dirty: I really, really intend to spend a long time (soon) understanding the whole subject of encodings and how they apply to Windoze consoles,
For the moment I just wanted sthg which would mean my program would NOT CRASH, and which I understood ... and also which didn't involve importing too many exotic modules (in particular I'm using Jython, so half the time a Python module turns out not in fact to be available).
def pr(s):
try:
print(s)
except UnicodeEncodeError:
for c in s:
try:
print( c, end='')
except UnicodeEncodeError:
print( '?', end='')
NB "pr" is shorter to type than "print" (and quite a bit shorter to type than "safeprint")...!
Kind of related on the answer by J. F. Sebastian, but more direct.
If you are having this problem when printing to the console/terminal, then do this:
>set PYTHONIOENCODING=UTF-8
For Python 2 try:
print unicode(string, 'unicode-escape')
For Python 3 try:
import os
string = "002 Could've Would've Should've"
os.system('echo ' + string)
Or try win-unicode-console:
pip install win-unicode-console
py -mrun your_script.py
TL;DR:
print(yourstring.encode('ascii','replace').decode('ascii'))
I ran into this myself, working on a Twitch chat (IRC) bot. (Python 2.7 latest)
I wanted to parse chat messages in order to respond...
msg = s.recv(1024).decode("utf-8")
but also print them safely to the console in a human-readable format:
print(msg.encode('ascii','replace').decode('ascii'))
This corrected the issue of the bot throwing UnicodeEncodeError: 'charmap' errors and replaced the unicode characters with ?.
Python 3.6 windows7: There is several way to launch a python you could use the python console (which has a python logo on it) or the windows console (it's written cmd.exe on it).
I could not print utf8 characters in the windows console. Printing utf-8 characters throw me this error:
OSError: [winError 87] The paraneter is incorrect
Exception ignored in: (_io-TextIOwrapper name='(stdout)' mode='w' ' encoding='utf8')
OSError: [WinError 87] The parameter is incorrect
After trying and failing to understand the answer above I discovered it was only a setting problem. Right click on the top of the cmd console windows, on the tab font chose lucida console.
The cause of your problem is NOT the Win console not willing to accept Unicode (as it does this since I guess Win2k by default). It is the default system encoding. Try this code and see what it gives you:
import sys
sys.getdefaultencoding()
if it says ascii, there's your cause ;-)
You have to create a file called sitecustomize.py and put it under python path (I put it under /usr/lib/python2.5/site-packages, but that is differen on Win - it is c:\python\lib\site-packages or something), with the following contents:
import sys
sys.setdefaultencoding('utf-8')
and perhaps you might want to specify the encoding in your files as well:
# -*- coding: UTF-8 -*-
import sys,time
Edit: more info can be found in excellent the Dive into Python book
Nowadays, the Windows console does not encounter this error, unless you redirect the output.
Here is an example Python script scratch_1.py:
s = "∞"
print(s)
If you run the script as follows, everything works as intended:
python scratch_1.py
∞
However, if you run the following, then you get the same error as in the question:
python scratch_1.py > temp.txt
Traceback (most recent call last):
File "C:\Users\Wok\AppData\Roaming\JetBrains\PyCharmCE2022.2\scratches\scratch_1.py", line 3, in <module>
print(s)
File "C:\Users\Wok\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode character '\u221e' in position 0: character maps to <undefined>
To solve this issue with the suggestion present in the original question, i.e. by replacing the erroneous characters with question marks ?, one can proceed as follows:
s = "∞"
try:
print(s)
except UnicodeEncodeError:
output_str = s.encode("ascii", errors="replace").decode("ascii")
print(output_str)
It is important:
to call decode(), so that the type of the output is str instead of bytes,
with the same encoding, here "ascii", to avoid the creation of mojibake.
James Sulak asked,
Is there any way I can make Python automatically print a ? instead of failing in this situation?
Other solutions recommend we attempt to modify the Windows environment or replace Python's print() function. The answer below comes closer to fulfilling Sulak's request.
Under Windows 7, Python 3.5 can be made to print Unicode without throwing a UnicodeEncodeError as follows:
In place of:
print(text)
substitute:
print(str(text).encode('utf-8'))
Instead of throwing an exception, Python now displays unprintable Unicode characters as \xNN hex codes, e.g.:
Halmalo n\xe2\x80\x99\xc3\xa9tait plus qu\xe2\x80\x99un point noir
Instead of
Halmalo n’était plus qu’un point noir
Granted, the latter is preferable ceteris paribus, but otherwise the former is completely accurate for diagnostic messages. Because it displays Unicode as literal byte values the former may also assist in diagnosing encode/decode problems.
Note: The str() call above is needed because otherwise encode() causes Python to reject a Unicode character as a tuple of numbers.
The issue is with windows default encoding being set to cp1252, and need to be set to utf-8. (check PEP)
Check default encoding using:
import locale
locale.getpreferredencoding()
You can override locale settings
import os
if os.name == "nt":
import _locale
_locale._gdl_bak = _locale._getdefaultlocale
_locale._getdefaultlocale = (lambda *args: (_locale._gdl_bak()[0], 'utf8'))
referenced code from stack link

Python, Unicode, and the Windows console

When I try to print a Unicode string in a Windows console, I get an error .
UnicodeEncodeError: 'charmap' codec can't encode character ....
I assume this is because the Windows console does not accept Unicode-only characters. What's the best way around this?
Is there any way I can make Python automatically print a ? instead of failing in this situation?
Edit: I'm using Python 2.5.
Note: #LasseV.Karlsen answer with the checkmark is sort of outdated (from 2008). Please use the solutions/answers/suggestions below with care!!
#JFSebastian answer is more relevant as of today (6 Jan 2016).
Update: Python 3.6 implements PEP 528: Change Windows console encoding to UTF-8: the default console on Windows will now accept all Unicode characters. Internally, it uses the same Unicode API as the win-unicode-console package mentioned below. print(unicode_string) should just work now.
I get a UnicodeEncodeError: 'charmap' codec can't encode character... error.
The error means that Unicode characters that you are trying to print can't be represented using the current (chcp) console character encoding. The codepage is often 8-bit encoding such as cp437 that can represent only ~0x100 characters from ~1M Unicode characters:
>>> u"\N{EURO SIGN}".encode('cp437')
Traceback (most recent call last):
...
UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in position 0:
character maps to
I assume this is because the Windows console does not accept Unicode-only characters. What's the best way around this?
Windows console does accept Unicode characters and it can even display them (BMP only) if the corresponding font is configured. WriteConsoleW() API should be used as suggested in #Daira Hopwood's answer. It can be called transparently i.e., you don't need to and should not modify your scripts if you use win-unicode-console package:
T:\> py -m pip install win-unicode-console
T:\> py -m run your_script.py
See What's the deal with Python 3.4, Unicode, different languages and Windows?
Is there any way I can make Python
automatically print a ? instead of failing in this situation?
If it is enough to replace all unencodable characters with ? in your case then you could set PYTHONIOENCODING envvar:
T:\> set PYTHONIOENCODING=:replace
T:\> python3 -c "print(u'[\N{EURO SIGN}]')"
[?]
In Python 3.6+, the encoding specified by PYTHONIOENCODING envvar is ignored for interactive console buffers unless PYTHONLEGACYWINDOWSIOENCODING envvar is set to a non-empty string.
Note: This answer is sort of outdated (from 2008). Please use the solution below with care!!
Here is a page that details the problem and a solution (search the page for the text Wrapping sys.stdout into an instance):
PrintFails - Python Wiki
Here's a code excerpt from that page:
$ python -c 'import sys, codecs, locale; print sys.stdout.encoding; \
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout); \
line = u"\u0411\n"; print type(line), len(line); \
sys.stdout.write(line); print line'
UTF-8
<type 'unicode'> 2
Б
Б
$ python -c 'import sys, codecs, locale; print sys.stdout.encoding; \
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout); \
line = u"\u0411\n"; print type(line), len(line); \
sys.stdout.write(line); print line' | cat
None
<type 'unicode'> 2
Б
Б
There's some more information on that page, well worth a read.
Update: On Python 3.6 or later, printing Unicode strings to the console on Windows just works.
So, upgrade to recent Python and you're done. At this point I recommend using 2to3 to update your code to Python 3.x if needed, and just dropping support for Python 2.x. Note that there has been no security support for any version of Python before 3.7 (including Python 2.7) since December 2021.
If you really still need to support earlier versions of Python (including Python 2.7), you can use https://github.com/Drekin/win-unicode-console , which is based on, and uses the same APIs as the code in the answer that was previously linked here. (That link does include some information on Windows font configuration but I doubt it still applies to Windows 8 or later.)
Note: despite other plausible-sounding answers that suggest changing the code page to 65001, that did not work prior to Python 3.8. (It does kind-of work since then, but as pointed out above, you don't need to do so for Python 3.6+ anyway.) Also, changing the default encoding using sys.setdefaultencoding is (still) not a good idea.
If you're not interested in getting a reliable representation of the bad character(s) you might use something like this (working with python >= 2.6, including 3.x):
from __future__ import print_function
import sys
def safeprint(s):
try:
print(s)
except UnicodeEncodeError:
if sys.version_info >= (3,):
print(s.encode('utf8').decode(sys.stdout.encoding))
else:
print(s.encode('utf8'))
safeprint(u"\N{EM DASH}")
The bad character(s) in the string will be converted in a representation which is printable by the Windows console.
The below code will make Python output to console as UTF-8 even on Windows.
The console will display the characters well on Windows 7 but on Windows XP it will not display them well, but at least it will work and most important you will have a consistent output from your script on all platforms. You'll be able to redirect the output to a file.
Below code was tested with Python 2.6 on Windows.
#!/usr/bin/python
# -*- coding: UTF-8 -*-
import codecs, sys
reload(sys)
sys.setdefaultencoding('utf-8')
print sys.getdefaultencoding()
if sys.platform == 'win32':
try:
import win32console
except:
print "Python Win32 Extensions module is required.\n You can download it from https://sourceforge.net/projects/pywin32/ (x86 and x64 builds are available)\n"
exit(-1)
# win32console implementation of SetConsoleCP does not return a value
# CP_UTF8 = 65001
win32console.SetConsoleCP(65001)
if (win32console.GetConsoleCP() != 65001):
raise Exception ("Cannot set console codepage to 65001 (UTF-8)")
win32console.SetConsoleOutputCP(65001)
if (win32console.GetConsoleOutputCP() != 65001):
raise Exception ("Cannot set console output codepage to 65001 (UTF-8)")
#import sys, codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
sys.stderr = codecs.getwriter('utf8')(sys.stderr)
print "This is an Е乂αmp١ȅ testing Unicode support using Arabic, Latin, Cyrillic, Greek, Hebrew and CJK code points.\n"
Just enter this code in command line before executing python script:
chcp 65001 & set PYTHONIOENCODING=utf-8
Like Giampaolo Rodolà's answer, but even more dirty: I really, really intend to spend a long time (soon) understanding the whole subject of encodings and how they apply to Windoze consoles,
For the moment I just wanted sthg which would mean my program would NOT CRASH, and which I understood ... and also which didn't involve importing too many exotic modules (in particular I'm using Jython, so half the time a Python module turns out not in fact to be available).
def pr(s):
try:
print(s)
except UnicodeEncodeError:
for c in s:
try:
print( c, end='')
except UnicodeEncodeError:
print( '?', end='')
NB "pr" is shorter to type than "print" (and quite a bit shorter to type than "safeprint")...!
Kind of related on the answer by J. F. Sebastian, but more direct.
If you are having this problem when printing to the console/terminal, then do this:
>set PYTHONIOENCODING=UTF-8
For Python 2 try:
print unicode(string, 'unicode-escape')
For Python 3 try:
import os
string = "002 Could've Would've Should've"
os.system('echo ' + string)
Or try win-unicode-console:
pip install win-unicode-console
py -mrun your_script.py
TL;DR:
print(yourstring.encode('ascii','replace').decode('ascii'))
I ran into this myself, working on a Twitch chat (IRC) bot. (Python 2.7 latest)
I wanted to parse chat messages in order to respond...
msg = s.recv(1024).decode("utf-8")
but also print them safely to the console in a human-readable format:
print(msg.encode('ascii','replace').decode('ascii'))
This corrected the issue of the bot throwing UnicodeEncodeError: 'charmap' errors and replaced the unicode characters with ?.
Python 3.6 windows7: There is several way to launch a python you could use the python console (which has a python logo on it) or the windows console (it's written cmd.exe on it).
I could not print utf8 characters in the windows console. Printing utf-8 characters throw me this error:
OSError: [winError 87] The paraneter is incorrect
Exception ignored in: (_io-TextIOwrapper name='(stdout)' mode='w' ' encoding='utf8')
OSError: [WinError 87] The parameter is incorrect
After trying and failing to understand the answer above I discovered it was only a setting problem. Right click on the top of the cmd console windows, on the tab font chose lucida console.
The cause of your problem is NOT the Win console not willing to accept Unicode (as it does this since I guess Win2k by default). It is the default system encoding. Try this code and see what it gives you:
import sys
sys.getdefaultencoding()
if it says ascii, there's your cause ;-)
You have to create a file called sitecustomize.py and put it under python path (I put it under /usr/lib/python2.5/site-packages, but that is differen on Win - it is c:\python\lib\site-packages or something), with the following contents:
import sys
sys.setdefaultencoding('utf-8')
and perhaps you might want to specify the encoding in your files as well:
# -*- coding: UTF-8 -*-
import sys,time
Edit: more info can be found in excellent the Dive into Python book
Nowadays, the Windows console does not encounter this error, unless you redirect the output.
Here is an example Python script scratch_1.py:
s = "∞"
print(s)
If you run the script as follows, everything works as intended:
python scratch_1.py
∞
However, if you run the following, then you get the same error as in the question:
python scratch_1.py > temp.txt
Traceback (most recent call last):
File "C:\Users\Wok\AppData\Roaming\JetBrains\PyCharmCE2022.2\scratches\scratch_1.py", line 3, in <module>
print(s)
File "C:\Users\Wok\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode character '\u221e' in position 0: character maps to <undefined>
To solve this issue with the suggestion present in the original question, i.e. by replacing the erroneous characters with question marks ?, one can proceed as follows:
s = "∞"
try:
print(s)
except UnicodeEncodeError:
output_str = s.encode("ascii", errors="replace").decode("ascii")
print(output_str)
It is important:
to call decode(), so that the type of the output is str instead of bytes,
with the same encoding, here "ascii", to avoid the creation of mojibake.
James Sulak asked,
Is there any way I can make Python automatically print a ? instead of failing in this situation?
Other solutions recommend we attempt to modify the Windows environment or replace Python's print() function. The answer below comes closer to fulfilling Sulak's request.
Under Windows 7, Python 3.5 can be made to print Unicode without throwing a UnicodeEncodeError as follows:
In place of:
print(text)
substitute:
print(str(text).encode('utf-8'))
Instead of throwing an exception, Python now displays unprintable Unicode characters as \xNN hex codes, e.g.:
Halmalo n\xe2\x80\x99\xc3\xa9tait plus qu\xe2\x80\x99un point noir
Instead of
Halmalo n’était plus qu’un point noir
Granted, the latter is preferable ceteris paribus, but otherwise the former is completely accurate for diagnostic messages. Because it displays Unicode as literal byte values the former may also assist in diagnosing encode/decode problems.
Note: The str() call above is needed because otherwise encode() causes Python to reject a Unicode character as a tuple of numbers.
The issue is with windows default encoding being set to cp1252, and need to be set to utf-8. (check PEP)
Check default encoding using:
import locale
locale.getpreferredencoding()
You can override locale settings
import os
if os.name == "nt":
import _locale
_locale._gdl_bak = _locale._getdefaultlocale
_locale._getdefaultlocale = (lambda *args: (_locale._gdl_bak()[0], 'utf8'))
referenced code from stack link

Categories