How to print japanese utf-8 on console in windows? - python

#coding=<utf8>
import os
os.popen('chcp 65001')
a = 'こんにちは世界'
print a.decode('utf8')
x = raw_input()
PYTHON 2.6 on Windows 7
It will run in IDLE with no errors.
However when run from the console, it errors and flashes very quickly and I can't read the error message.
How can it be done in windows console?
By the way, doing this with other languages like spanish or portuguese will work fine. It's languages like japanese, russian, greek, hebrew that have this error behavior in the windows console.
*EDIT
as requested I changed to this code:
#coding=<utf8>
import os, sys
os.popen('chcp 65001')
print(sys.stdout.encoding)
x = raw_input('press enter to continue')
a = 'こんにちは世界'
print a.decode('utf8')
x = raw_input()
It will print:
cp437
and then of course, continue on to flash and fail on the decoding bit...
It looks like the popen('chcp 65001') doesn't work in changing the codepage.
I still don't think this is the root of the problem, however it would be helpful to know an efficient way of changing this codepage.

Update
Never mind. The OP is using Windows.
Interestingly changing the encoding declaration to #encoding=<utf8> did not work in Ubuntu.
Original Answer
This worked for me (Ubuntu Jaunty, Python 2.6.2). The only change I made was to the first line declaring the encoding.
# encoding: utf-8
import os
os.popen('chcp 65001')
a = 'こんにちは世界'
print a.decode('utf8')
x = raw_input()

Related

Can't input (č ć š ž đ ) in python 2.7.x console

So , I've searching trough internet and it is so frustrating. When I try to search I get explanations on how to unicode decode and encode files. But I'm not interested in that. I know this is possible since I was able to do that. I don't know what happened. Also, I've tried reinstalling python. Changing the options under the configure IDLE etc. On my laptop there are no problems at all. I can do this:
>> a = 'ć'
>>
>> print a
>> ć
And on my PC I get:
>> a = 'ć'
>> Unsupported characters in input
I repeat, I'm not talking about encoding in the program. I'm talking about Python console, and it works on my laptop and worked on my previous machines. There's got to be a solution to this issue.
Also, take look at this:
>>> a = u'ç'
>>> a
u'\xe7'
>>> print a
ç
>>> a = u'ć'
Unsupported characters in input
>>>
The Windows console is limited in what it can display. You can change the code page using the old DOS CHCP command.
CHCP 65001
This will change the code page to UTF-8, and make the console more relaxed. You will probably see a square instead of the actual character, but at least you won't see an error.
Try to:
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
...

ArcPy and Python encoding messing up?

I am faced with a strange behavior between ArcPy and Python encoding. I work with VisualStudio 2010 Shell with Python tools for VS (PTVS) installed. I isolated my problem through a simple script file. The py script file that contains the following commands. In VisualStudio, I have set the « Advanced Save Options...» to « UTF-8 without signature ». The script simply print on the screen a accented string, then import arcpy module, then again print the same string. Importing Arcpy seems to change the Python encoding setup but I don't know why and I would like to restablish it correctly because it causes problems a little bit everywhere in the original script.
I checked the python « encoding » folder and erased every pyc file. Than I ran the script and it generated 3 pyc files :
cp850.pyc (which corresponds to my stdout.encoding)
cp1252.pyc (which corresponds to my Windows environment encoding)
utf_8.pyc (which fits the encoding of my script)
When ArcPy is being imported, something comes altering the encoding that affects the initial variables.
Why?
Is it possible with some Python command to find where the ArcPy encode cp1252 is located and read it so that I can make a function that deals with it?
# -*- coding: utf-8 -*-
import sys
print ('Loaded encoding : %(t)s'%{'t':sys.getdefaultencoding()})
reload(sys) # See stackoverflow question 2276200
sys.setdefaultencoding('utf-8')
print ('Set default encoding : %(t)s'%{'t':sys.getdefaultencoding()})
print ''
texte = u'Récuperation des données'
print ('Original type : %(t)s'%{'t':type(texte)})
print ('Original text : %(t)s'%{'t':texte})
print ''
import arcpy
print ('imported arcpy')
print ('Loaded encoding : %(t)s'%{'t':sys.getdefaultencoding()})
print ''
print ('arcpy mess up original type : %(t)s'%{'t':type(texte)})
print ('arcpy mess up original text : %(t)s'%{'t':texte})
print ''
print ('arcpy mess up reencoded with cp1252 type : %(t)s'%{'t':type(texte.encode('cp1252'))})
print ('arcpy mess up reencoded with cp1252 text : %(t)s'%{'t':texte.encode('cp1252')})
raw_input()
and when I run the script, I get these results :
Loaded encoding : ascii
Set encoding : utf-8
Original type : type 'unicode'
Original text : Récuperation des données <--- This is right
import arcpy
Loaded encoding : utf-8
arcpy mess up original type : type 'unicode'
arcpy mess up original text : R'cuperation des donn'es> <--- This is wrong
arcpy mess up ReEncode with cp1252 type : type 'str'
arcpy mess up ReEncode with cp1252 text : Récuperation des données> <--- This is fits with the original unicode
Answering my question.
From ESRI support, I got this information :
By default, python in the command line is not going to change the code page to a UTF-8 based text for print statements to show up in Unicode. ArcGIS on the other hand specifically allows unicode values to be passed to it and has changed the code page within the command line so that the values you see printed are the values ArcGIS is using. This is why the command line should be the only environment where you see the import sys followed by import arcpy give you a different printed value.
Since my application run scripts that does not always need arcpy, depending of what I want it to do, to solve my problem, I made a generic function that deals with the encoding, whether or not arcpy has been imported, using the information provided by :
Coding_CMD_Window = sys.stdout.encoding
Coding_OS = locale.getpreferredencoding()
Coding_Script = sys.getdefaultencoding()
Coding2Use = Coding_CMD_Window
if any('arcpy' in importedmodules for importedmodules in sys.modules):
Coding2Use = Coding_OS
Also, I made sure that all of my scripts had the proper UTF-8 encoding without signature.
Hope this helps anyone.
For those in doubt, try something like the following (e.g., in a .py file):
import codecs
#import arcpy
f = codecs.open('utf.file.txt', encoding='utf-8-sig') #assuming a BOM present
l = f.readlines()
print u''.join(l)
Then run the same code once more, but first remove the hash comment from the arcpy line. It'll take about 6 seconds more time.
What I get is perfectly fine text running the first version, gibberish when allowing arcpy to load.
ArcGIS for Desktop version used: 10.2.1

Python pipe cp1252 string from PowerShell to a python (2.7) script

After a few days of dwelling over stackoverflow and python 2.7 doc, I have come to no conclusion about this.
Basically I'm running a python script on a windows server that must have as input a block of text. This block of text (unfortunately) has to be passed by a pipe. Something like:
PS > [something_that_outputs_text] | python .\my_script.py
So the problem is:
The server uses cp1252 encoding and I really cannot change it due to administrative regulations and whatnot. And when I pipe the text to my python script, when I read it, it comes already with ? whereas characters like \xe1 should be.
What I have done so far:
Tested with UTF-8. Yep, chcp 65001 and $OutputEncoding = [Console]::OutputEncoding "solve it", as in python gets the text perfectly and then I can decode it to unicode etc. But apparently they don't let me do it on the server /sadface.
A little script to test what the hell is happening:
import codecs
import sys
def main(argv=None):
if argv is None:
argv = sys.argv
if len(argv)>1:
for arg in argv[1:]:
print arg.decode('cp1252')
sys.stdin = codecs.getreader('cp1252')(sys.stdin)
text = sys.stdin.read().strip()
print text
return 0
if __name__=="__main__":
sys.exit(main())
Tried it with both the codecs wrapping and without it.
My input & output:
PS > echo "Blá" | python .\testinput.py blé
blé
Bl?
--> So there's no problem with the argument (blé) but the piped text (Blá) is no good :(
I even converted the text string to hex and, yes, it gets flooded with 3f (AKA mr ?), so it's not a problem with the print.
[Also: it's my first question here... feel free to ask any more info about what I did]
EDIT
I don't know if this is relevant or not, but when I do sys.stdin.encoding it yields None
Update: So... I have no problems with cmd. Checked sys.stdin.encoding while running the program on cmd and everything went fine. I think my head just exploded.
How about saving the data into a file and piping it to Python on a CMD session? Invoke Powershell and Python on CMD. Like so,
c:\>powershell -command "c:\genrateDataForPython.ps1 -output c:\data.txt"
c:\>type c:\data.txt | python .\myscript.py
Edit
Another an idea: convert the data into base64 format in Powershell and decode it in Python. Base64 is simple in Powershell, I guess in Python it isn't hard either. Like so,
# Convert some accent chars to base64
$s = [Text.Encoding]::UTF8.GetBytes("éêèë")
[System.Convert]::ToBase64String($s)
# Output:
w6nDqsOow6s=
# Decode:
$d = [System.Convert]::FromBase64String("w6nDqsOow6s=")
[Text.Encoding]::UTF8.GetString($d)
# Output
éêèë

Unicode and locale issues

I am struggling to write a Python (version 2.7) script which makes use of some unicode properties. The problem arises when I attempt to use embedded locale package. Here is the code snippet that I am having issues with:
# -*- coding: utf-8 -*-
import datetime
import os
import locale
locale.setlocale(locale.LC_ALL, 'greek')
day = datetime.date.today()
dayFull = day.strftime('%A')
myString = u"ΚΑΛΗΜΕΡΑ"
print myString
print dayFull
While dayFull prints the current day name just fine (in greek letters), myString comes out in console as question mark characters. How can I fix it, can someone please point out my mistake here?
P.S. My system is a Windows 7 machine.
Use the correct Greek code page in the console, as well as a font that supports Greek characters, such as Consolas. This worked for me in Windows 7 and Python 2.7.3:
C:\>chcp 1253
Active code page: 1253
C:\>python temp.py
ΚΑΛΗΜΕΡΑ
Σάββατο
FYI, Python 3.3 works correctly with the (also Greek) 737 code page, but Python 2.7 prints:
C:\>temp.py
????????
Σάββατο

ANSI Escape Sequences Fail in Cygwin

I am trying to create a 'drop down menu' for a CLI program using ANSI escape sequences in Python 2.7.2. I use ANSI escape sequences to change the 'options' to red and display them below the input line, then afterwards clear them.
I am able to run the code on a system running Ubuntu 10.04LTS which runs Python 2.6.5, but am not able to get the program to run on a Windows XP machine running Cygwin minTTY 1.0.3. Is there an issue with sys.stdout.flush() in Windows or Cygwin? Is it a Python 2.6 to 2.7 issue? Don't really know where to start the debug.
#!C:\Python27\python.exe
#!/usr/bin/python
import sys
table = {1:'foo', 2:'bar', 3:'foo'}
print '\n'
for item in table.keys() :
sys.stdout.write('\033[1;31m %s) %s\033[0m\n' % (item,table[item]))
sys.stdout.flush()
sys.stdout.write('%s' %((item+1)*'\033M'))
sys.stdout.flush()
answer = raw_input("Select foobar: ")
sys.stdout.write('\033[J')
sys.stdout.flush()
print 'You have selected %s' % (table[answer])
The problem is that the raw input text does not print out until after you make your selection in minTTY (again, code works fine on Ubuntu), which kind of defeats the purpose of prompt text. Thanks in advance - Paul
You are not able to do this because Windows console does not support ANSI at all.
Back in MSDOS days there was an ANSI.SYS driver that you could load in order to enable them but not anymore.
My impression is that you will need to investigate the use of something like https://pypi.python.org/pypi/UniCurses if you want to build a TUI interface (text-user-interface)
References:
How to make win32 console recognize ANSI/VT100 escape sequences?

Categories