How to solve UnicodeDecodeError in Python 3.6? - python

I am switched from Python 2.7 to Python 3.6.
I have scripts that deal with some non-English content.
I usually run scripts via Cron and also in Terminal.
I had UnicodeDecodeError in my Python 2.7 scripts and I solved by this.
# encoding=utf8
import sys
reload(sys)
sys.setdefaultencoding('utf8')
Now in Python 3.6, it doesnt work. I have print statements like print("Here %s" % (myvar)) and it throws error. I can solve this issue by replacing it to myvar.encode("utf-8") but I don't want to write with each print statement.
I did PYTHONIOENCODING=utf-8 in my terminal and I have still that issue.
Is there a cleaner way to solve UnicodeDecodeError issue in Python 3.6?
is there any way to tell Python3 to print everything in utf-8? just like I did in Python2?

It sounds like your locale is broken and have another bytes->Unicode issue. The thing you did for Python 2.7 is a hack that only masked the real problem (there's a reason why you have to reload sys to make it work).
To fix your locale, try typing locale from the command line. It should look something like:
LANG=en_GB.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_ALL=
locale depends on LANG being set properly. Python effectively uses locale to work out what encoding to use when writing to stdout in. If it can't work it out, it defaults to ASCII.
You should first attempt to fix your locale. If locale errors, make sure you've installed the correct language pack for your region.
If all else fails, you can always fix Python by setting PYTHONIOENCODING=UTF-8. This should be used as a last resort as you'll be masking problems once again.
If Python is still throwing an error after setting PYTHONIOENCODING then please update your question with the stacktrace. Chances are you've got an implied conversion going on.

I had this issue when using Python inside a Docker container based on Ubuntu 18.04.
It appeared to be a locale issue, which was solved by adding the following to the Dockerfile:
ENV LANG C.UTF-8

To everyone using pickle to load a file previously saved in python 2 and getting an UnicodeDecodeError, try setting pickle encoding parameter:
with open("./data.pkl", "rb") as data_file:
samples = pickle.load(data_file, encoding='latin1')

For a Python-only solution you will have to recreate your sys.stdout object:
import sys, codecs
sys.stdout = codecs.getwriter('utf-8')(sys.stdout.detach())
After this, a normal print("hello world") should be encoded to UTF-8 automatically.
But you should try to find out why your terminal is set to such a strange encoding (which Python just tries to adopt to). Maybe your operating system is configured wrong somehow.
EDIT: In my tests unsetting the env variable LANG produced this strange setting for the stdout encoding for me:
LANG= python3
import sys
sys.stdout.encoding
printed 'ANSI_X3.4-1968'.
So I guess you might want to set your LANG to something like
en_US.UTF-8. Your terminal program doesn't seem to do this.

Python 3 (including 3.6) is already Unicode supported. Here is the doc - https://docs.python.org/3/howto/unicode.html
So you don't need to force Unicode support like Python 2.7. Try to run your code normally. If you get any error reading a Unicode text file you need to use the encoding='utf-8' parameter while reading the file.

for docker with python3.6, use LANG=C.UTF-8 python or jupyter xxx works for me, thanks to #Daniel and #zhy

I mean you could write an custom function like this:
(Not optimal i know)
import sys
def printUTF8(input):
print(input.encode("utf-8"))

Related

Trying to run python code on jenkins in ubuntu

all.
I recently started working with Jenkins, in an attempt to replace cronjob with Jenkins pipeline. I have really a bit knowledge of programming jargon. I learned what I learned from questions on stackoverflow. So, if you guys need any more info, I would really appreciate if you use plain English.
So, I installed the lastest version of Jenkins and suggested plugins plus all the plugins that I could find useful to python running.
Afterwards, I searched stackoverflow and other websites to make this work, but all I could do was
#!/usr/bin/env python
from __future__ import print_function
print('Hello World')
And it succeeded.
Currently, Jenkins is running on Ubuntu 16.04, and I am using anaconda3's python (~/anaconda3/bin/python).
When I tried to run a bit more complicated python code (by that I mean import pandas), it gives me import error.
What I have tried so far is
execute python script build: import pandas - import error
execute shell build: import pandas (import pandas added to the code that worked above)
python builder build: import pandas - invalid interpreter error
pipeline job: sh python /path_to_python_file/*.py - import error
All gave errors. Since 'hello world' works, I believe that using anaconda3's python is not an issue. Also, it imported print_function just fine, so I want to know what I should do from here. Change workspace setting? workdirectory setting? code changes?
Thanks.
Since 'hello world' works, I believe that using anaconda3's python is not an issue.
Your assumption is wrong.
There are multiple ways of solving the issue but they all come down to using the correct python interpreter with installed pandas. Usually in ubuntu you'll have at least two interpreters. One for python 2 and one for python 3 and you'll use them in shell by calling either python pth/to/myScript.py or python3 pth/to/myScript.py. python and python3 are in this case just a sort of labels which point to the correct executables, using environmental variable PATH.
By installing anaconda3 you are adding one more interpreter with pandas and plenty of other preinstalled packages. If you want to use it, you need to tell somehow your shell or Jenkins about it. If import pandas gives you an error then you're probably using a different interpreter or a different python environment (but this is out of scope here).
Coming back to your script
Following this stack overflow answer, you'll see that all the line #!/usr/bin/env python does, is to make sure that you're using the first python interpreter on your Ubuntu's environment path. Which almost for sure isn't the one you installed with anaconda3. Most likely it will be the default python 2 distributed with ubuntu. If you want to make sure which interpreter exactly is running your script, instead of 'Hello World' put inside:
#!/usr/bin/env python
import sys
print(sys.executable) # this line will give you the exact path to the interpreter
print(sys.version) # this one will give you the version
Ok, so what to do?
Well, run your script using the correct interpreter. Remove #!/usr/bin/env python from your file and if you have a pipeline, add there:
sh "/home/yourname/anaconda3/bin/python /path_to_python_file/myFile.py"
It will most likely solve the issue. It's also quite flexible in the sense that if you ever want to use this python file on a different machine, you won't have your username hardcoded inside.

is it possible to load python module from unicode path on windows?

I'm trying to understand if it is simply impossible to load a python module from a unicode path, or if there is some trick I am missing.
This bug report seems to imply that it is not possible:
http://bugs.python.org/issue11619
Goal:
suppose C:\Users\pkarasev\д contains Foo.py , then I want to do this:
import sys
sys.path.append(str('c:/Users/pkarasev/\xd0\xb3').decode('utf-8') )
from Foo import *
This fails with "cannot find module..." although u'c:/Users/pkarasev/\0433' has been added to my sys.path and 0433 is the correct encoding for д.
note that the str(...).decode(...) method works for things like os.open, but for some reason not for loading modules. Is there a different format for the encoding? Is this action impossible, period? Do I need to use python 3.x instead of 2.7.3 with some different syntax?
edit: cash award is eligible if someone knows a trick to do this (on windows)
Yeah it is either a bug in python for not supporting windows, or a bug in windows for not being sane and using utf-8 encoding. In python27.dll you can step in and see the bogus module paths...

Python 2.7: "unresolved import: ConfigParser"

I recently wrote a Python 2.7 script (using PyDev on Eclipse) that took advantage of the built-in ConfigParser module, and the script works perfectly. But when I exported it and sent it to a colleague, he could not get it to work. He keeps getting an "unresolved import: ConfigParser" error even though we are using the exact same settings. This isn't supposed to happen as ConfigParser is built-in.
I've Googled everywhere but could not seem to find any working solution. Any help would be appreciated.
ConfigParser was renamed to configparser in python 3. Chances are he's using 3 and cannot find the old py2 name.
You can use:
try:
import configparser as ConfigParser
except ImportError:
import ConfigParser
To see what's happening it may be nice comparing on both computers which sys.path is being used (i.e.: put at the start of the module being run the code below and compare the output in each case):
import sys
print '\n'.join(sorted(sys.path))
Now, if the error is not when running the code (i.e.: it runs fine and you get no exceptions), and he gets the error only in PyDev, probably the interpreter configuration in his side is not correct and one of the paths printed through the command above is not being added to the PYTHONPATH (it could be that he's on a virtual env and didn't add the paths to the original /Lib or has added some path that shouldn't be there -- or even has some ConfigParser module somewhere else which is conflicting with the one from the Python standard library).

Why does simplejson work in Terminal and not TextMate?

I'm using simplejson to get data from the New York Time API. It works when I run the file through the terminal with the command "python test.py" but not when I run through TextMate using command + R. I'm running the exact same file. Why is this?
I am running Snow Leopard 10.6.4, TextMate 1.5.10, and Python 2.6.4.
Edit: Sorry for forgetting to include this: by "doesn't work," I mean it says "No module named simplejson". I also noticed that this happens for PyMongo as well ("No module named pymongo").
What doesn't work? You should provide more information like error messages and what-not. However, I assume that the version of python is different, and simplejson isn't on your PYTHONPATH when launched from textmate.
Just so you know, simplejson was incorporated into the Python 2.6 distribution's standard library as json. So if you don't feel like wrestling with the import problem, try simply changing all your references to simplejson to json instead.
But, as suggested, this is going to turn out to be a PythonPath issue. Run these lines in the Python interpreter and from TextMate and compare the results.
import sys
print sys.path
To find out where simplejson is installed (if you don't know), do this in the Python interpreter:
import simplejson
print simplejson.__file__
If you want/need to set PYTHONPATH manually for TextMate, you can do that by adding it under Preferences > Advanced > Shell Variables.

Python japanese module is not found

I run following python script.
pygame2exe.py
ImportError: No module named japanese
What's wrong?
Do not you know solutions?
The script makes use of japanese encoding
# -*- coding: sjis -*-
[...]
args.append('japanese,encodings');
It's a shame cause it could use UTF-8 that works out of the box.
You can't run this script unless you install the japanese module. I can't find any reference of it on the web, and I can read in the code :
# make standalone, needs at least pygame-1.5.3 and py2exe-0.3.1
# fixed for py2exe-0.6.x by RyoN3 at 03/15/2006
If you haven't installed the last version of pygame and py2exe, I would start by that since they may embed the module you need.
To add to e-satis' explanation, the "japanese" module is provided by the Japan PUG, but I don't think you've actually needed it since around Python 2.2. I believe that all the Japanese codecs are included in a standard Python install these days. I certainly don't use this module, and I handle SJIS in my programs just fine.
So I think you could just get rid if the forced import, and do fine. That is, delete these lines:
args.append('-p')
args.append('japanese,encodings') # JapaneseCodecを強制的に含める
Since you don't have the "japanese" module on your system, if the program runs OK on your system, then the frozen version should be OK without this module.
However, I would recommend using Unicode throughout instead of byte strings, and if you insist on byte strings, I'd at least put them in UTF-8.

Categories