Why does GitBash doesnt run properly Python code? [duplicate] - python

in test.py i have
print('Привет мир')
with cmd worked as normal
> python test.py
?????? ???
with Git Bash got error
$ python test.py
Traceback (most recent call last):
File "test.py", line 2, in <module>
print('\u041f\u0440\u0438\u0432\u0435\u0442 \u043c\u0438\u0440')
File "C:\Users\raksa\AppData\Local\Programs\Python\Python36\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-5: character maps to <undefined>
Does anyone know the reason behind of getting error when execute python code via Git Bash?

Python 3.6 directly uses the Windows API to write Unicode to the console, so is much better about printing non-ASCII characters. But Git Bash isn't the standard Windows console so it falls back to previous behavior, encoding Unicode string in the terminal encoding (in your case, cp1252). cp1252 doesn't support Cyrillic, so it fails. This is "normal". You'll see the same behavior in Python 3.5 and older.
In the Windows console Python 3.6 should print the actual Cyrillic characters, so what is surprising is your "?????? ???". That is not "normal", but perhaps you don't have a font selected that supports Cyrillic. I have a couple of Python versions installed:
C:\>py -3.6 --version
Python 3.6.2
C:\>py -3.6 test.py
Привет мир
C:\>py -3.3 --version
Python 3.3.5
C:\>py -3.3 test.py
Traceback (most recent call last):
File "test.py", line 1, in <module>
print('\u041f\u0440\u0438\u0432\u0435\u0442 \u043c\u0438\u0440 \u4f60\u597d')
File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-5: character maps to <undefined>

Had this problem with python 3.9
import sys, locale
print("encoding", sys.stdout.encoding)
print("local preferred", locale.getpreferredencoding())
print("fs encoding", sys.getfilesystemencoding())
If this returns "cp1252" and not "utf-8" then print() doesn't work with unicode.
This was fixed by changing the windows system locale.
Region settings > Additional settings > Administrative > Change system locale > Beta: Use Unicode UTF-8 for worldwide language support

Since Python 3.7 you can do
import sys
sys.stdout.reconfigure(encoding='utf-8')
This mostly fixes the git bash problem for me with Chinese characters. They still don't print correctly to standard out on the console, but it doesn't crash, and when redirected to a file the correct unicode characters are present.
Credit to sth in this answer.

Set the the environment variable PYTHONUTF8=1, or
Use -Xutf8 command line option.

Related

Why does "Save as UTF-8" in Eclipse fix the Python UnicodeEncodeError?

I have:
a file file.txt containing just one character: ♠, and UTF-8 encoded.
a CP-1252 encoded Python script test.py containing:
import codecs
text = codecs.open('file.txt', 'r', 'UTF-8').read()
print('text: {0}'.format(text))
When I run it in Eclipse 4.7.2 on Windows 7 SP1 x64 Ultimate and with Python 3.5.2 x64, I get the error message:
Traceback (most recent call last):
File "C:\eclipse-4-7-2-workspace\SEtest\test.py", line 3, in <module>
print('text: {0}'.format(text))
File "C:\programming\python\Python35-32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2660' in position 6: character maps to <undefined>
My understanding is that the issue stems from the fact that on Microsoft Windows, by default the Python interpreter uses CP-1252 as its encoding and therefore has is with the character ♠.
Also, I would note at that point that I kept Eclipse default encoding, which can be seen in Preferences > General > Workspace:
When I change the Python script test.py to:
import codecs
print(u'♠') # <--- adding this line is the only modification
text = codecs.open('file.txt', 'r', 'UTF-8').read()
print('text: {0}'.format(text))
then try to run it, I get the error message:
(note: Eclipse is configured to save the script whenever I run it).
After selecting the option Save as UTF-8, I get the same error message:
Traceback (most recent call last):
File "C:\Users\Francky\eclipse-4-7-2-workspace\SEtest\test.py", line 2, in <module>
print(u'\u2660')
File "C:\programming\python\Python35-32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2660' in position 0: character maps to <undefined>
which I think is expected since the Python interpreter still uses CP-1252.
But if I run the script again in Eclipse without any modification, it works. The output is:
♠
text: ♠
Why does it work?
Phyton converts the text to be printed to the encoding of the console which is the active code page on Windows (at least until version 3.6).
To avoid the UnicodeEncodeError you have to change the console encoding to UTF-8. There are several ways to do this, e. g. on the Windows command line by executing cmd /K chcp 65001.
In Eclipse, the encoding of the console can be set to UTF-8 in the run configuration (Run > Run Configurations...), in the Common tab.
The text file encoding settings in Window > Preferences: General > Workspace and in Project > Properties: Ressource are only used by text editors how to display text files.

BeautifulSoup code works in IPython Notebook but not Eclipse

The following code works fine when run from Jupyter IPython notebook:
from bs4 import BeautifulSoup
xml_file_path = "<Path to XML file>"
s = BeautifulSoup(open(xml_file_path), "xml")
But it fails when creating the soup when run from Eclipse/PyDev (which uses the same Python interpreter):
Traceback (most recent call last):
File "~/parser/scratch.py", line 3, in <module>
s = BeautifulSoup(open(xml_file), "xml")
File "/anaconda/lib/python3.5/site-packages/bs4/__init__.py", line 175, in __init__
markup = markup.read()
File "/anaconda/lib/python3.5/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 1812: ordinal not in range(128)
Python version: 3.5.2 (Anaconda 4.1.1)
BeautifulSoup: version 4
IPython Notebook version: 4.2.1
Eclipse version: Mars.2 Release (4.5.2)
PyDev version: 5.1.2.20160623256
Mac OS X: El Capitan 10.11.6
UPDATE:
The character in the file that is causing issue in Eclipse is �, but this causes no issues in IPython Notebook! If I remove this character from the XML file, then the code works fine in Eclipse as well. Is there some setting in Eclipse I need to change so that the code won't fail on this (and possibly other such) character?
I think that you have to open with open(xml_file_path, 'rb') -- and specify the encoding for things to work the same in both (otherwise you're having an implicit conversion from bytes to unicode -- and apparently it uses a different encoding based on your env, since you have something in Eclipse and another thing in IPython).
Try doing:
with open(xml_file_path, 'rb') as stream:
contents = stream.read()
contents.decode('utf-8')
Just to check if you're really able to decode it as utf-8 (i.e.: to check if that char is a valid utf-8 char).

Python run py file - works OK in IDLE, not via command prompt windows 7 - UnicodeEncodeError

I installed 3.5.1 Python several months ago on Windows 7. I have a python program which is in .py file. It runs fine from IDLE Python.
However, when I run it from command prompt, it says:
C:\_1\Python>info_.py
START -------------------------
Traceback (most recent call last):
File "C:\_1\Python\info_.py", line 33, in <module>
print (data)
File "C:\Program Files\Python35\lib\encodings\cp866.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xab' in position 18
4: character maps to <undefined>
data is:
u = urllib.request.urlopen('http://www.****')
data = u.read()
I want to run programs from prompt too. What could I do to be sure my code works same in IDLE and via command prompt on Windows?
Try using encode
print(data.encode('utf-8'))

Python webbrowser platform specific unicode error on osx

I am developing a cross-platform script on a Windows 7, Python 2.7 computer. The script will be also used on a MacOSX computer with Python 2.7 installed.
The following script is working perfectly on my Windows computer, however when I run it on the Mac, I get a unicode error.
#!/usr/bin/python
# -*- coding: utf-8 -*-
import webbrowser
webbrowser.open(u"http://www.google.fr?q=testéè")
Here is the error:
Mac-mini-de-paul:paul paul$ python testUnicode.py
Traceback (most recent call last):
File "testUnicode.py", line 6, in <module>
webbrowser.open(u"http://www.google.fr?q=testéè")
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/webbrowser.py", line 62, in open
if browser.open(url, new, autoraise):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/webbrowser.py", line 637, in open
osapipe.write(script)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 42-43: ordinal not in range(128)
I don't really understand what's the problem here, Python's base functions are supposed to deal properly with unicode filenames, aren't they?
Note:
I saw this question, but it did not help me and the OP is not having any error: IMO not a duplicate
Try to manually encode to utf-8:
webbrowser.open(u"http://www.google.fr?q=testéè".encode('utf-8'))
or don't use unicode, if you provide file encoding:
#!/usr/bin/python
# -*- coding: utf-8 -*-
...
webbrowser.open("http://www.google.fr?q=testéè")

Set up Notepad++ and NppExec to print unicode characters from python

I have an utf-8 encoded file cjk.py:
print("打印")
Unsurprisingly, running python cjk.py yields
Traceback (most recent call last):
File "cjk.py", line 1, in <module>
print('\u6253\u5370')
File "C:\Python33\lib\encodings\cp850.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-1: character maps to <undefined>
Yet running idle -r cjk.py works perfectly:
打印
Can I configure notepad++'s NppExec plugin to behave like Idle? I've trying setting input and output encoding to UTF-8, to no avail (same exception as when running python cjk.py from the console)
I had the same problem and fixed it.
Add env_set PYTHONIOENCODING=utf-8
just below C:\Python27\python.exe "$(FULL_CURRENT_PATH)"
in the dialog box when you press F6.
Worked like a charm for me, hope it helps.
Source: http://sourceforge.net/p/npp-plugins/discussion/672146/thread/d94ff609/

Categories