SyntaxError: Non-UTF-8 code starting with '\xc4' - python

Problem while importing module
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import win32com.client
when I run it in eclipse, syntaxError occurs.
but it runs perfectly on Windows console.
how to type the right coding of pywin32?

For eclipse unicode console support:
Add -Dfile.encoding=UTF-8 to eclipse.ini which is in the eclipse install directory.
In eclipse – Run\Run Configurations\Python Run\configuration\Common\make sure UTF-8 is selected
In eclipse – Window\Preferences\General\Workspace\Text file encoding\making sure UTF-8 is selected
In [python install path]\Lib\site.py – chane from encoding = “ascii” to encoding = “utf-8”
Make sure you’re using unicode supporting fonts in eclipse – Window\Preferences\Appearance\Colors and Fonts\Debug\Console font\Edit

Related

Accented characters in Python 2.7 [duplicate]

I'm running a recent Linux system where all my locales are UTF-8:
LANG=de_DE.UTF-8
LANGUAGE=
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
...
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=
Now I want to write UTF-8 encoded content to the console.
Right now Python uses UTF-8 for the FS encoding but sticks to ASCII for the default encoding :-(
>>> import sys
>>> sys.getdefaultencoding()
'ascii'
>>> sys.getfilesystemencoding()
'UTF-8'
I thought the best (clean) way to do this was setting the PYTHONIOENCODING environment variable. But it seems that Python ignores it. At least on my system I keep getting ascii as default encoding, even after setting the envvar.
# tried this in ~/.bashrc and ~/.profile (also sourced them)
# and on the commandline before running python
export PYTHONIOENCODING=UTF-8
If I do the following at the start of a script, it works though:
>>> import sys
>>> reload(sys) # to enable `setdefaultencoding` again
<module 'sys' (built-in)>
>>> sys.setdefaultencoding("UTF-8")
>>> sys.getdefaultencoding()
'UTF-8'
But that approach seems unclean. So, what's a good way to accomplish this?
Workaround
Instead of changing the default encoding - which is not a good idea (see mesilliac's answer) - I just wrap sys.stdout with a StreamWriter like this:
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout)
See this gist for a small utility function, that handles it.
It seems accomplishing this is not recommended.
Fedora suggested using the system locale as the default,
but apparently this breaks other things.
Here's a quote from the mailing-list discussion:
The only supported default encodings in Python are:
Python 2.x: ASCII
Python 3.x: UTF-8
If you change these, you are on your own and strange things will
start to happen. The default encoding does not only affect
the translation between Python and the outside world, but also
all internal conversions between 8-bit strings and Unicode.
Hacks like what's happening in the pango module (setting the
default encoding to 'utf-8' by reloading the site module in
order to get the sys.setdefaultencoding() API back) are just
downright wrong and will cause serious problems since Unicode
objects cache their default encoded representation.
Please don't enable the use of a locale based default encoding.
If all you want to achieve is getting the encodings of
stdout and stdin correctly setup for pipes, you should
instead change the .encoding attribute of those (only).
--
Marc-Andre Lemburg
eGenix.com
This is how I do it:
#!/usr/bin/python2.7 -S
import sys
sys.setdefaultencoding("utf-8")
import site
Note the -S in the bangline. That tells Python to not automatically import the site module. The site module is what sets the default encoding and the removes the method so it can't be set again. But will honor what is already set.
How to print UTF-8 encoded text to the console in Python < 3?
print u"some unicode text \N{EURO SIGN}"
print b"some utf-8 encoded bytestring \xe2\x82\xac".decode('utf-8')
i.e., if you have a Unicode string then print it directly. If you have
a bytestring then convert it to Unicode first.
Your locale settings (LANG, LC_CTYPE) indicate a utf-8 locale and
therefore (in theory) you could print a utf-8 bytestring directly and it
should be displayed correctly in your terminal (if terminal settings
are consistent with the locale settings and they should be) but you
should avoid it: do not hardcode the character encoding of your
environment inside your script; print Unicode directly instead.
There are many wrong assumptions in your question.
You do not need to set PYTHONIOENCODING with your locale settings,
to print Unicode to the terminal. utf-8 locale supports all Unicode characters i.e., it works as is.
You do not need the workaround sys.stdout =
codecs.getwriter(locale.getpreferredencoding())(sys.stdout). It may
break if some code (that you do not control) does need to print bytes
and/or it may break while
printing Unicode to Windows console (wrong codepage, can't print undecodable characters). Correct locale settings and/or PYTHONIOENCODING envvar are enough. Also, if you need to replace sys.stdout then use io.TextIOWrapper() instead of codecs module like win-unicode-console package does.
sys.getdefaultencoding() is unrelated to your locale settings and to
PYTHONIOENCODING. Your assumption that setting PYTHONIOENCODING
should change sys.getdefaultencoding() is incorrect. You should
check sys.stdout.encoding instead.
sys.getdefaultencoding() is not used when you print to the
console. It may be used as a fallback on Python 2 if stdout is
redirected to a file/pipe unless PYTHOHIOENCODING is set:
$ python2 -c'import sys; print(sys.stdout.encoding)'
UTF-8
$ python2 -c'import sys; print(sys.stdout.encoding)' | cat
None
$ PYTHONIOENCODING=utf8 python2 -c'import sys; print(sys.stdout.encoding)' | cat
utf8
Do not call sys.setdefaultencoding("UTF-8"); it may corrupt your
data silently and/or break 3rd-party modules that do not expect
it. Remember sys.getdefaultencoding() is used to convert bytestrings
(str) to/from unicode in Python 2 implicitly e.g., "a" + u"b". See also,
the quote in #mesilliac's answer.
If the program does not display the appropriate characters on the screen,
i.e., invalid symbol,
run the program with the following command line:
PYTHONIOENCODING=utf8 python3 yourprogram.py
Or the following, if your program is a globally installed module:
PYTHONIOENCODING=utf8 yourprogram
On some platforms as Cygwin (mintty.exe terminal) with Anaconda Python (or Python 3), simply run export PYTHONIOENCODING=utf8 and
later run the program does not work,
and you are required to always do every time PYTHONIOENCODING=utf8 yourprogram to run the program correctly.
On Linux, in case of sudo, you can try to do pass the -E argument to export the user variables to the sudo process:
export PYTHONIOENCODING=utf8
sudo -E python yourprogram.py
If you try this and it did no work, you will need to enter on a sudo shell:
sudo /bin/bash
PYTHONIOENCODING=utf8 yourprogram
Related:
How to print UTF-8 encoded text to the console in Python < 3?
Changing default encoding of Python?
Forcing UTF-8 over cp1252 (Python3)
Permanently set Python path for Anaconda within Cygwin
https://superuser.com/questions/1374339/what-does-the-e-in-sudo-e-do
Why bash -c 'var=5 printf "$var"' does not print 5?
https://unix.stackexchange.com/questions/296838/whats-the-difference-between-eval-and-exec
While realizing the OP question is for Linux: when ending up here through a search engine, on Windows 10 the following fixes the issue:
set PYTHONIOENCODING=utf8
python myscript.py

Python 3.5.2 Non-Ascii Character Output

I'm running Python 3.5.2 and am trying to do some basic stuff with unicode and UTF-8. I'm currently just trying to output non-ASCII characters and am unable to do so. For example, this:
ddd = '\u0144'
print(ddd)
gives me a Unicode encode error, telling me that the character maps to undefined. From what I understand of unicode in Python 3.5.2, mapping should happen automatically. I tried putting # -*- coding: utf-8 -*- before the code and various combinations of .decode and .encode as well, but to no avail.
PM 2Ring, typing in chcp 65001 in command prompt did the trick. Thanks!

Python console can't deal with Unicode?

(I'm using Python 3.4 for this, on Windows)
So, I have this code I whipped out to better show my troubles:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
os.startfile('C:\\téxt.txt')
On IDLE it works as it should (it just opens that file I specified), but on Console (double-click) it keeps saying Windows can't find the file. Of course, if I try to open "text.txt" instead it works perfectly, as long as it exists.
It's slowly driving me insane. Someone help me, please.
You are using the wrong encoding, try using cp1252 -
# -*- coding: cp1252 -*-
Your file name is 'C:\téxt.txt',try to use 'C:\text.txt'

How to display unicode string in its natural language in PyDev console [duplicate]

My configuration: Win7 + Python 2.6 + eclipse + PyDev
How do I enable Unicode print statements in:
PyDev console in eclipse
Idle Python GUI
Example print statement:
print(u"שלום עולם")
This comes out as:
ùìåí òåìí
For eclipse unicode console support:
Add -Dfile.encoding=UTF-8 to eclipse.ini which is in the eclipse install directory.
In eclipse - Run\Run Configurations\Python Run\configuration\Common\ make sure UTF-8 is selected
In eclipse - Window\Preferences\General\Workspace\Text file encoding\ making sure UTF-8 is selected
In [python install path]\Lib\site.py - change from encoding = "ascii" to encoding = "utf-8"
Make sure you're using unicode supporting fonts in eclipse - Window\Preferences\Appearance\Colors and Fonts\Debug\Console font\Edit
In the installation I did all of the above:
print(u"שלום עולם") # Doesn't work
print("שלום עולם") # Works
For django models:
print(my_model.my_field) # Doesn't work
print(my_model.my_field.encode('utf-8')) # Works
I was having the a same problem in Eclipse Luna 4.0.4 with Python 3.4.1 and PyDev 3.6.0. I tried the steps given above, and a few others, and was getting nowhere.
What worked for me was, in Eclipse, in Preferences —> PyDev —> Interpreters —> Python Interpreter, in the Environment tab, I added the environment variable PYTHONIOENCODING and specified its value as utf-8.
That did the trick for me…
PYTHONIOENCODING is a pretty good generic way of fixing this problem. However, the Eclipse way of setting the locale of its console is as follows:
Set the Run Configuration encoding:
Edit Run Configuration
Click on "Common" tab
Set Encoding to "UTF-8"

Printing Unicode in eclipse Pydev console and in Idle

My configuration: Win7 + Python 2.6 + eclipse + PyDev
How do I enable Unicode print statements in:
PyDev console in eclipse
Idle Python GUI
Example print statement:
print(u"שלום עולם")
This comes out as:
ùìåí òåìí
For eclipse unicode console support:
Add -Dfile.encoding=UTF-8 to eclipse.ini which is in the eclipse install directory.
In eclipse - Run\Run Configurations\Python Run\configuration\Common\ make sure UTF-8 is selected
In eclipse - Window\Preferences\General\Workspace\Text file encoding\ making sure UTF-8 is selected
In [python install path]\Lib\site.py - change from encoding = "ascii" to encoding = "utf-8"
Make sure you're using unicode supporting fonts in eclipse - Window\Preferences\Appearance\Colors and Fonts\Debug\Console font\Edit
In the installation I did all of the above:
print(u"שלום עולם") # Doesn't work
print("שלום עולם") # Works
For django models:
print(my_model.my_field) # Doesn't work
print(my_model.my_field.encode('utf-8')) # Works
I was having the a same problem in Eclipse Luna 4.0.4 with Python 3.4.1 and PyDev 3.6.0. I tried the steps given above, and a few others, and was getting nowhere.
What worked for me was, in Eclipse, in Preferences —> PyDev —> Interpreters —> Python Interpreter, in the Environment tab, I added the environment variable PYTHONIOENCODING and specified its value as utf-8.
That did the trick for me…
PYTHONIOENCODING is a pretty good generic way of fixing this problem. However, the Eclipse way of setting the locale of its console is as follows:
Set the Run Configuration encoding:
Edit Run Configuration
Click on "Common" tab
Set Encoding to "UTF-8"

Categories