Problems with Encoding in Eclipse Console and Python - python

I guess I need some help regarding encodings in Python (2.6) and Eclipse. I used Google and the so-search and tried a lot of things but as a matter of fact I don't get it.
So, how do I achieve, that the output in the Eclipse console is able to show äöü etc.?
I tried:
Declaring the document encoding in the first line with
# -*- coding: utf-8 -*-
I changed the encoding settings in Window/Preferences/General/Workspace and Project/Properties to UTF-8
As nothing changed I tried the following things alone and in combination but nothing seemed to work out:
Changing the stdout as mentioned in the Python Cookbook:
sys.stdout = codecs.lookup("utf-8")-1
Adding an unicode u:
print u"äöü".encode('UTF8')
reloading sys (I don't know what for but it doesn't work either ;-))
I am trying to do this in order to debug the encoding-problems I have in my programs... (argh)
Any ideas? Thanks in advance!
EDIT:
I work on Windows 7 and it is EasyEclipse

Got it! If you have the same problem go to
Run/Run Configurations/Common and select the UTF-8 (e.g.) as console encoding.
So, finally, print "ö" results in "ö"

Even this is a bit old question, I'm new in StackOverflow and I'd like to contribute a bit. You can change the default encoding in Eclipse (currently Neon) for the all text editors from the menu Window -> Preferences -> General -> Workspace : Text file encoding
Item Path

Related

Strange symbols instead of Cyrillic Django

I'm trying to add data to DB by external script.
In this script, I first create a list of Model elements, and then add them to DB by bulk_create method.
from shop.models import SpeciesOfWood
species_of_wood = [
SpeciesOfWood(title="Ель"),
SpeciesOfWood(title="Кедр"),
SpeciesOfWood(title="Пихта"),
SpeciesOfWood(title="Лиственница")
]
SpeciesOfWood.objects.bulk_create(species_of_wood)
This code works well in terms of adding data to DB, but idk what happens with values I wanted to add, here is screenshot:
I already tried to add:
# -*- coding: utf-8 -*-
u prefix to title values
But it didn't change anything.
UPD 1
I tried to create models myself like SpeciesOfWood.objects.create(...) and it also doesn't change anything.
UPD 2
I tried to add cyrillic data via admin panel, and it works ok, data looks like I wanted. I still don't know why data added via script added with wrong encoding, but via admin panel ok.
UPD 3
I tried to use SpeciesOfWood.objects.create(...) via python manage.py shell, and it works well if I write it by hand. Also, it can be useful, I executing this dummy data script using this code:
>>> python manage.py shell
>>> exec(open("my_script.py").read())
This looks suspiciously like your database is misconfigured, or the software with which you're reading the data back is: the characters in the table image corresponds to your original data encoded to UTF-8 then decoded to Windows-1251 (a "legacy" cyrillic encoding although wikipedia tells me it remains extremely popular):
>>> print("Ель".encode('utf-8').decode('windows-1251'))
Ель
This means either the database is configured such that the reader assumes Windows-1251 encoding, or the software you use to view database content assumes the database returns data in whatever encoding is setup on the system (and your desktop environment is configured in cyrillic using windows-1251 / cp1251).
Either way it doesn't look like an issue with the input to me as the data is originally encoded / stored as UTF-8.
Answer lies in the way how I executing script. The default encoding is platform dependent (whatever locale.getpreferredencoding() returns).... I was mistaken in thinking that default encoding while reading file is UTF-8.

VScode's Python Debug Console doesn't print Unicode Chinese Charactor Properly

Image to show the problemHere is the code to illustrate the problem:
# -*- coding:utf-8 -*-
text = u"严"
print text
If I run the code above in VSCode debug, it will prints "涓" instead of "严", which is the result of the first 2 byte (\xe4\xb8) of u"严" in UTF-8 (\xe4\xb8\xa5), decoded in gbk codec. \xe4\xb8 in gbk is "涓".
However if I run the same code in pycharm it prints "严" exactly as I expected. And it is the same If I run the code in powershell.
Wired the VSCode python debugger behaves different with python interpreter. How can I get the print result correct, I do not think add a decode("gbk") in the end of every text would be a good idea.
My Environment data
VS Code version: 1.21
VSCode Python Extension version : 2018.2.1
OS and version: Windows 10
Python version : 2.7.14
Type of virtual environment used : No
For Windows users, in your System Variables, add PYTHONIOENCODING Variables,change its value to UTF-8, then restart vscode, this worked on my pc.
Modify task.json file in vscode, I am not sure if it will still work on version 2.0.
You can find it here:Changing the encoding for a task output
or here in github:
Tasks should support specifying the output encoding
add this before you start a py script:
import io
import sys
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='utf8')
If you open your python file in VS2017 you can do the following:
Go to:
File->
Save selected item as ->
click on the down-arrow next to "Save button"
clicking "Save With Encoding...
select the type of coding you need...
if .py already saved then overwrite file > select "yes"
select for example : "Chinese Simplified (GB18030) - Codepage 54936"
Also, add the following on line 2 of your .py file:
# -*- coding: gb18030 -*- or # -*- coding: gb2312 -*-
Those encodings accept your 严 character.
Nice link to endocoder/decoder tester here.

How to write utf8 to standard output in a way that works with python2 and python3

I want to write a non-ascii character, lets say → to standard output. The tricky part seems to be that some of the data that I want to concatenate to that string is read from json. Consider the follwing simple json document:
{"foo":"bar"}
I include this because if I just want to print → then it seems enough to simply write:
print("→")
and it will do the right thing in python2 and python3.
So I want to print the value of foo together with my non-ascii character →. The only way I found to do this such that it works in both, python2 and python3 is:
getattr(sys.stdout, 'buffer', sys.stdout).write(data["foo"].encode("utf8")+u"→".encode("utf8"))
or
getattr(sys.stdout, 'buffer', sys.stdout).write((data["foo"]+u"→").encode("utf8"))
It is important to not miss the u in front of → because otherwise a UnicodeDecodeError will be thrown by python2.
Using the print function like this:
print((data["foo"]+u"→").encode("utf8"), file=(getattr(sys.stdout, 'buffer', sys.stdout)))
doesnt seem to work because python3 will complain TypeError: 'str' does not support the buffer interface.
Did I find the best way or is there a better option? Can I make the print function work?
The most concise I could come up with is the following, which you may be able to make more concise with a few convenience functions (or even replacing/overriding the print function):
# -*- coding=utf-8 -*-
import codecs
import os
import sys
# if you include the -*- coding line, you can use this
output = 'bar' + u'→'
# otherwise, use this
output = 'bar' + b'\xe2\x86\x92'.decode('utf-8')
if sys.stdout.encoding == 'UTF-8':
print(output)
else:
output += os.linesep
if sys.version_info[0] >= 3:
sys.stdout.buffer.write(bytes(output.encode('utf-8')))
else:
codecs.getwriter('utf-8')(sys.stdout).write(output)
The best option is using the -*- encoding line, which allows you to use the actual character in the file. But if for some reason, you can't use the encoding line, it's still possible to accomplish without it.
This (both with and without the encoding line) works on Linux (Arch) with python 2.7.7 and 3.4.1.
It also works if the terminal's encoding is not UTF-8. (On Arch Linux, I just change the encoding by using a different LANG environment variable.)
LANG=zh_CN python test.py
It also sort of works on Windows, which I tried with 2.6, 2.7, 3.3, and 3.4. By sort of, I mean I could get the '→' character to display only on a mintty terminal. On a cmd terminal, that character would display as 'ΓåÆ'. (There may be something simple I'm missing there.)
If you don't need to print to sys.stdout.buffer, then the following should print fine to sys.stdout. I tried it in both Python 2.7 and 3.4, and it seemed to work fine:
# -*- coding=utf-8 -*-
print("bar" + u"→")

Python and EasyEclipse: identical code but results differ (encoding)

I can't believe it. After I solved my question in Problems with Encoding in Eclipse Console and Python I thought it wouldn't happen again that I got problems here. But now this:
I have a program test.py in the project TestMe that looks like this:
print "ö"
-> Run as... Python Run results in
ö
So far so good. When I now copy the program in EasyEclipse by right click/copy and paste I receive the program copy of test.py in the same project that looks exactly the same:
print "ö"
-> Bun Run as... Python Run results in
ö
I noticed, that the file properties changed from Encoding UTF-8 to Default, but also changing to UTF-8 doesn't help here.
Another difference between the two files is the line ending which is "Windows" in the original file and "Unix" in the copy (great definition of copy, btw). Changing this in Notepad++ also doesn't change anything.
I am perplexed...
Set up:
Python 2.5
Windows 7
Easy Eclipse 1.2.2.2
Settings that I've set to UTF-8 / Windows:
Project/Rightclick/Properties
File/Rightclick/Properties
Window/Preferences/Workspace
Several places to change the encoding, most immersive first:
Workspace Window > Preferences > General > Workspace
Project Properties
File Properties
Run Configuration.
Using the first method is the most useful one as the others including the console inherit from it by default which is probably what you want.

How to work with UTF-8 in Python 2.7?

I just want to get UTF-8 working. I tried this:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
t = "одобрение за"
print t
But when I run this program from the command line, output looks like: одобрение за
I've searched up and down the net, tried the whole sys.setdefaultencoding thing, tried calling encode() and decode(), tried placing the little "u" in front, tried unicode(), etc.
I'm about ready to explode from frustration. Is there a definitive answer for what the heck you're supposed to do?
Your code works for me (tm)
In [1]: t = u"одобрение за"
In [2]: print t
одобрение за
Make sure your terminal supports UTF-8. One way is to check the LANG env-variable:
$ echo $LANG
en_US.UTF-8
also, try the locale command.
$LANG/locale just tells you what your system will use when writing to stdout/stderr.
Best way to test if terminal supports UTF-8 is probably to print something to it and see if it looks correct. Something like this:
echo -e '\xe2\x82\xac'
You should get a €-sign.
If not, try a different shell...
Since you are using Windows cmd.exe, you have to follow two steps:
Make sure your console is using Lucidia console font family (other fonts cannot display UTF-8 properly).
Type chcp 65001 (that's change codepage) and hit enter.
Run your command.
For subsequent runs (once you close the cmd.exe window), you'll have to change the codepage again. The font should be permanent.

Categories