Python, windows console and encodings (cp 850 vs cp1252)

Python, windows console and encodings (cp 850 vs cp1252) - python

I thought I knew everything about encodings and Python, but today I came across a weird problem: although the console is set to code page 850 - and Python reports it correctly - parameters I put on the command line seem to be encoded in code page 1252. If I try to decode them with sys.stdin.encoding, I get the wrong result. If I assume 'cp1252', ignoring what sys.stdout.encoding reports, it works.
Am I missing something, or is this a bug in Python ? Windows ? Note: I am running Python 2.6.6 on Windows 7 EN, locale set to French (Switzerland).
In the test program below, I check that literals are correctly interpreted and can be printed - this works. But all values I pass on the command line seem to be encoded wrongly:
#!/usr/bin/python
# -*- encoding: utf-8 -*-
import sys
literal_mb = 'utf-8 literal: üèéÃÂç€ÈÚ'
literal_u = u'unicode literal: üèéÃÂç€ÈÚ'
print "Testing literals"
print literal_mb.decode('utf-8').encode(sys.stdout.encoding,'replace')
print literal_u.encode(sys.stdout.encoding,'replace')
print "Testing arguments ( stdin/out encodings:",sys.stdin.encoding,"/",sys.stdout.encoding,")"
for i in range(1,len(sys.argv)):
arg = sys.argv[i]
print "arg",i,":",arg
for ch in arg:
print " ",ch,"->",ord(ch),
if ord(ch)>=128 and sys.stdin.encoding == 'cp850':
print "<-",ch.decode('cp1252').encode(sys.stdout.encoding,'replace'),"[assuming input was actually cp1252 ]"
else:
print ""
In a newly created console, when running
C:\dev>test-encoding.py abcé€
I get the following output
Testing literals
utf-8 literal: üèéÃÂç?ÈÚ
unicode literal: üèéÃÂç?ÈÚ
Testing arguments ( stdin/out encodings: cp850 / cp850 )
arg 1 : abcÚÇ
a -> 97
b -> 98
c -> 99
Ú -> 233 <- é [assuming input was actually cp1252 ]
Ç -> 128 <- ? [assuming input was actually cp1252 ]
while I would expect the 4th character to have an ordinal value of 130 instead of 233 (see the code pages 850 and 1252).
Notes: the value of 128 for the euro symbol is a mystery - since cp850 does not have it. Otherwise, the '?' are expected - cp850 cannot print the characters and I have used 'replace' in the conversions.
If I change the code page of the console to 1252 by issuing chcp 1252 and run the same command, I (correctly) obtain
Testing literals
utf-8 literal: üèéÃÂç€ÈÚ
unicode literal: üèéÃÂç€ÈÚ
Testing arguments ( stdin/out encodings: cp1252 / cp1252 )
arg 1 : abcé€
a -> 97
b -> 98
c -> 99
é -> 233
€ -> 128
Any ideas what I'm missing ?
Edit 1: I've just tested by reading sys.stdin. This works as expected: in cp850, typing 'é' results in an ordinal value of 130. So the problem is really for the command line only. So, is the command line treated differently than the standard input ?
Edit 2: It seems I had the wrong keywords. I found another very close topic on SO: Read Unicode characters from command-line arguments in Python 2.x on Windows. Still, if the command line is not encoded like sys.stdin, and since sys.getdefaultencoding() reports 'ascii', it seems there is no way to know its actual encoding. I find the answer using win32 extensions pretty hacky.

Replying to myself:
On Windows, the encoding used by the console (thus, that of sys.stdin/out) differs from the encoding of various OS-provided strings - obtained through e.g. os.getenv(), sys.argv, and certainly many more.
The encoding provided by sys.getdefaultencoding() is really that - a default, chosen by Python developers to match the "most reasonable encoding" the interpreter use in extreme cases. I get 'ascii' on my Python 2.6, and tried with portable Python 3.1, which yields 'utf-8'. Both are not what we are looking for - they are merely fallbacks for encoding conversion functions.
As this page seems to state, the encoding used by OS-provided strings is governed by the Active Code Page (ACP). Since Python does not have a native function to retrieve it, I had to use ctypes:
from ctypes import cdll
os_encoding = 'cp' + str(cdll.kernel32.GetACP())
Edit: But as Jacek suggests, there actually is a more robust and Pythonic way to do it (semantics would need validation, but until proven wrong, I'll use this)
import locale
os_encoding = locale.getpreferredencoding()
# This returns 'cp1252' on my system, yay!
and then
u_argv = [x.decode(os_encoding) for x in sys.argv]
u_env = os.getenv('myvar').decode(os_encoding)
On my system, os_encoding = 'cp1252', so it works. I am quite certain this would break on other platforms, so feel free to edit and make it more generic. We would certainly need some kind of translation table between the ACP reported by Windows and the Python encoding name - something better than just prepending 'cp'.
This is a unfortunately a hack, although I find it a bit less intrusive than the one suggested by this ActiveState Code Recipe (linked to by the SO question mentioned in Edit 2 of my question). The advantage I see here is that this can be applied to os.getenv(), and not only to sys.argv.

I tried the solutions. It may still have some encoding problems. We need to use true type fonts.
Fix:
Run chcp 65001 in cmd to change the encoding to UTF-8.
Change cmd font to a True-Type one like Lucida Console that supports the
preceding code pages before 65001
Here's my complete fix for the encoding error:
def fixCodePage():
import sys
import codecs
import ctypes
if sys.platform == 'win32':
if sys.stdout.encoding != 'cp65001':
os.system("echo off")
os.system("chcp 65001") # Change active page code
sys.stdout.write("\x1b[A") # Removes the output of chcp command
sys.stdout.flush()
LF_FACESIZE = 32
STD_OUTPUT_HANDLE = -11
class COORD(ctypes.Structure):
_fields_ = [("X", ctypes.c_short), ("Y", ctypes.c_short)]
class CONSOLE_FONT_INFOEX(ctypes.Structure):
_fields_ = [("cbSize", ctypes.c_ulong),
("nFont", ctypes.c_ulong),
("dwFontSize", COORD),
("FontFamily", ctypes.c_uint),
("FontWeight", ctypes.c_uint),
("FaceName", ctypes.c_wchar * LF_FACESIZE)]
font = CONSOLE_FONT_INFOEX()
font.cbSize = ctypes.sizeof(CONSOLE_FONT_INFOEX)
font.nFont = 12
font.dwFontSize.X = 7
font.dwFontSize.Y = 12
font.FontFamily = 54
font.FontWeight = 400
font.FaceName = "Lucida Console"
handle = ctypes.windll.kernel32.GetStdHandle(STD_OUTPUT_HANDLE)
ctypes.windll.kernel32.SetCurrentConsoleFontEx(handle, ctypes.c_long(False), ctypes.pointer(font))
Note: You can see a font change while executing the program.

Well what worked for me was using following code sniped:
# -*- coding: utf-8 -*-
import os
import sys
print (f"OS: {os.device_encoding(0)}, sys: {sys.stdout.encoding}")
comparing both on some windows systems with python 3.8, showed that os.device_encoding(0) always reflected code page setting in terminal. (Tested with Powershell and with old cmd-shell on Windows 10 and Windows 7)
This was even true after changing the terminals code page with shell command:
chcp 850
or e.g.:
chcp 1252
Now using os.device_encoding(0) for tasks like decoding a subprocess stdout result from bytes to string worked out even with Non-ASCII chars like é, ö, ³, ↓, ...
So as other already pointed out on windows local setting is really just a system information, about user preferences, but not what shell actually might currently use.

Related

Python standard IO under Windows PowerShell and CMD

I have the following two-line Python (v. 3.10.7) program "stdin.py":
import sys
print(sys.stdin.read())
and the following one-line text file "ansi.txt" (CP1252 encoding) containing:
‘I am well’ he said.
Note that the open and close quotes are 0x91 and 0x92, respectively. In Windows-10 cmd mode the behavior of the Python code is as expected:
python stdin.py < ansi.txt # --> ‘I am well’ he said.
On the other hand in Windows Powershell:
cat .\ansi.txt | python .\stdin.py # --> ?I am well? he said.
Apparently the CP1252 characters are seen as non-printable characters in the
combination Python/PowerShell. If I replace in "stdin.py" the standard input by file input, Python prints correctly the CP1252 quote characters to screen. PowerShell by itself recognizes and prints correctly 0x91 and 0x92.
Questions: can somebody explain to me why cmd works differently than PowerShell in combination with Python? Why doesn't Python recognize the CP1252 quote characters 0x91 and 0x92 when they are piped into it by PowerShell?

tl;dr
Use the $OutputEncoding preference variable:
In Windows PowerShell:
# Using the system's legacy ANSI code page, as Python does by default.
# NOTE: The & { ... } enclosure isn't strictly necessary, but
# ensures that the $OutputEncoding change is only temporary,
# by limiting to the child scope that the enclosure cretes.
& {
$OutputEncoding = [System.Text.Encoding]::Default
"‘I am well’ he said." | python -c 'import sys; print(sys.stdin.read())'
}
# Using UTF-8 instead, which is generally preferable.
# Note the `-X utf8` option (Python 3.7+)
& {
$OutputEncoding = [System.Text.UTF8Encoding]::new()
"‘I am well’ he said." | python -X utf8 -c 'import sys; print(sys.stdin.read())'
}
In PowerShell (Core) 7+:
# Using the system's legacy ANSI code page, as Python does by default.
# Note: In PowerShell (Core) / .NET 5+,
# [System.Text.Encoding]::Default` now reports UTF-8,
# not the active ANSI encoding.
& {
$OutputEncoding = [System.Text.Encoding]::GetEncoding([cultureinfo]::CurrentCulture.TextInfo.ANSICodePage)
"‘I am well’ he said." | python -c 'import sys; print(sys.stdin.read())'
}
# Using UTF-8 instead, which is generally preferable.
# Note the `-X utf8` option (Python 3.7+)
# NO need to set $OutputEncoding, as it now *defaults* to UTF-8
"‘I am well’ he said." | python -X utf8 -c 'import sys; print(sys.stdin.read())'
Note:
$OutputEncoding controls what encoding is used to send data TO external programs via the pipeline (to stdin). It defaults to ASCII(!) in Windows PowerShell, and UTF-8 in PowerShell (Core).
[Console]::OutputEncoding controls how data received FROM external programs (via stdout) is decoded. It defaults to the console's active code page, which in turn defaults to the system's legacy OEM code page, such as 437 on US-English systems).
That these two encodings are not aligned by default is unfortunate; while Windows PowerShell will see no more changes, there is hope for PowerShell (Core): it would make sense to have it default consistently to UTF-8:
GitHub issue #7233 suggests at least defaulting the shortcut files that launch PowerShell to UTF-8 (code page 65001); GitHub issue #14945 more generally discusses the problematic mismatch.
In Windows 10 and above, there is an option to switch to UTF-8 system-wide, which then makes both the OEM and ANSI code pages default to UTF-8 (65001); however, this has far-reaching consequences and is still labeled as being in beta as of Windows 11 - see this answer.
Background information:
It is the $OutputEncoding preference variable that determines what character encoding PowerShell uses to send data (invariably text, as of PowerShell 7.3) to an external program via the pipeline.
Note that this even applies when data is read from a file: PowerShell, as of v7.3, never sends raw bytes through the pipeline: it reads the content into .NET strings first and then re-encodes them based on $OutputEncoding on sending them through the pipeline to an external program.
Therefore, what encoding your ansi.txt input file uses is ultimately irrelevant, as long as PowerShell decodes it correctly when reading it into .NET strings (which are internally composed of UTF-16 code units).
See this answer for more information.
Thus, the character encoding stored in $OutputEncoding must match the encoding that the target program expects.
By default the encoding in $OutputEncoding is unrelated to the encoding implied by the console's active code page (which itself defaults to the system's legacy OEM code page, such as 437 on US-English systems), which is what at least legacy console applications tend to use; however, Python does not, and uses the legacy ANSI code page; other modern CLIs, notably NodeJS' node.exe, always use UTF-8.
While $OutputEncoding's default in PowerShell (Core) 7+ is now UTF-8, Windows PowerShell's default is, regrettably, ASCII(!), which means that non-ASCII characters get "lossily" transliterated to verbatim ASCII ? characters, which is what you saw.
Therefore, you must (temporarily) set $OutputEncoding to the encoding that Python expects and/or ask it use UTF-8 instead.

What's the deal with Python 3.4, Unicode, different languages and Windows?

Happy examples:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
czech = u'Leoš Janáček'.encode("utf-8")
print(czech)
pl = u'Zdzisław Beksiński'.encode("utf-8")
print(pl)
jp = u'リング 山村 貞子'.encode("utf-8")
print(jp)
chinese = u'五行'.encode("utf-8")
print(chinese)
MIR = u'Машина для Инженерных Расчётов'.encode("utf-8")
print(MIR)
pt = u'Minha Língua Portuguesa: çáà'.encode("utf-8")
print(pt)
Unhappy output:
b'Leo\xc5\xa1 Jan\xc3\xa1\xc4\x8dek'
b'Zdzis\xc5\x82aw Beksi\xc5\x84ski'
b'\xe3\x83\xaa\xe3\x83\xb3\xe3\x82\xb0 \xe5\xb1\xb1\xe6\x9d\x91 \xe8\xb2\x9e\xe5\xad\x90'
b'\xe4\xba\x94\xe8\xa1\x8c'
b'\xd0\x9c\xd0\xb0\xd1\x88\xd0\xb8\xd0\xbd\xd0\xb0 \xd0\xb4\xd0\xbb\xd1\x8f \xd0\x98\xd0\xbd\xd0\xb6\xd0\xb5\xd0\xbd\xd0\xb5\xd1\x80\xd0\xbd\xd1\x8b\xd1\x85 \xd0\xa0\xd0\xb0\xd1\x81\xd1\x87\xd1\x91\xd1\x82\xd0\xbe\xd0\xb2'
b'Minha L\xc3\xadngua Portuguesa: \xc3\xa7\xc3\xa1\xc3\xa0'
And if I print them like this:
jp = u'リング 山村 貞子'
print(jp)
I get:
Traceback (most recent call last):
File "x.py", line 5, in <module>
print(jp)
File "C:\Python34\lib\encodings\cp850.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position
0-2: character maps to <undefined>
I've also tried the following from this question (And other alternatives that involve sys.stdout.encoding):
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import print_function
import sys
def safeprint(s):
try:
print(s)
except UnicodeEncodeError:
if sys.version_info >= (3,):
print(s.encode('utf8').decode(sys.stdout.encoding))
else:
print(s.encode('utf8'))
jp = u'リング 山村 貞子'
safeprint(jp)
And things get even more cryptic:
πâ¬πâ│πé░ σ▒▒µ¥æ Φ▓₧σ¡É
And the docs were not very helpful.
So, what's the deal with Python 3.4, Unicode, different languages and Windows? Almost all possible examples I could find, deal with Python 2.x.
Is there a general and cross-platform way of printing ANY Unicode character from any language in a decent and non-nasty way in Python 3.4?
EDIT:
I've tried typing at the terminal:
chcp 65001
To change the code page, as proposed here and in the comments, and it did not work (Including the attempt with sys.stdout.encoding)

Update: Since Python 3.6, the code example that prints Unicode strings directly should just work now (even without py -mrun).
Python can print text in multiple languages in Windows console whatever chcp says:
T:\> py -mpip install win-unicode-console
T:\> py -mrun your_script.py
where your_script.py prints Unicode directly e.g.:
#!/usr/bin/env python3
print('š áč') # cz
print('ł ń') # pl
print('リング') # jp
print('五行') # cn
print('ш я жх ё') # ru
print('í çáà') # pt
All you need is to configure the font in your Windows console that can display the desired characters.
You could also run your Python script via IDLE without installing non-stdlib modules:
T:\> py -midlelib -r your_script.py
To write to a file/pipe, use PYTHONIOENCODING=utf-8 as #Mark Tolonen suggested:
T:\> set PYTHONIOENCODING=utf-8
T:\> py your_script.py >output-utf8.txt
Only the last solution supports non-BMP characters such as 😒 (U+1F612 UNAMUSED FACE) -- py -mrun can write them but Windows console displays them as boxes even if the font supports corresponding Unicode characters (though you can copy-paste the boxes into another program, to get the characters).

The problem iswas (see Python 3.6 update below) with the Windows console, which supports an ANSI character set appropriate for the region targeted by your version of Windows. Python throws an exception by default when unsupported characters are output.
Python can read an environment variable to output in other encodings, or to change the error handling default. Below, I've read the console default and change the default error handling to print a ? instead of throwing an error for characters that are unsupported in the console's current code page.
C:\>chcp
Active code page: 437 # Note, US Windows OEM code page.
C:\>set PYTHONIOENCODING=437:replace
C:\>example.py
Leo? Janá?ek
Zdzis?aw Beksi?ski
??? ?? ??
??
?????? ??? ?????????? ????????
Minha Língua Portuguesa: çáà
Note the US OEM code page is limited to ASCII and some Western European characters.
Below I've instructed Python to use UTF8, but since the Windows console doesn't support it, I redirect the output to a file and display it in Notepad:
C:\>set PYTHONIOENCODING=utf8
C:\>example >out.txt
C:\>notepad out.txt
On Windows, its best to use a Python IDE that supports UTF-8 instead of the console when working with multiple languages. If only using one language, select it as the system locale in the Region and Language control panel and the console will support the characters of that language.
Update for Python 3.6
Python 3.6 now uses Windows Unicode APIs to write directly to the console, so the only limit is the console font's support of the characters. The following code works in a US Windows console. I have a Chinese language pack installed, it even displays the Chinese and Japanese if the console font is changed. Even without the correct font, replacement characters are shown in the console. Cut-n-paste to an environment such as this web page will display the characters correctly.
#!python3.6
#coding: utf8
czech = 'Leoš Janáček'
print(czech)
pl = 'Zdzisław Beksiński'
print(pl)
jp = 'リング 山村 貞子'
print(jp)
chinese = '五行'
print(chinese)
MIR = 'Машина для Инженерных Расчётов'
print(MIR)
pt = 'Minha Língua Portuguesa: çáà'
print(pt)
Output:
Leoš Janáček
Zdzisław Beksiński
リング 山村 貞子
五行
Машина для Инженерных Расчётов
Minha Língua Portuguesa: çáà

Can't input (č ć š ž đ ) in python 2.7.x console

So , I've searching trough internet and it is so frustrating. When I try to search I get explanations on how to unicode decode and encode files. But I'm not interested in that. I know this is possible since I was able to do that. I don't know what happened. Also, I've tried reinstalling python. Changing the options under the configure IDLE etc. On my laptop there are no problems at all. I can do this:
>> a = 'ć'
>>
>> print a
>> ć
And on my PC I get:
>> a = 'ć'
>> Unsupported characters in input
I repeat, I'm not talking about encoding in the program. I'm talking about Python console, and it works on my laptop and worked on my previous machines. There's got to be a solution to this issue.
Also, take look at this:
>>> a = u'ç'
>>> a
u'\xe7'
>>> print a
ç
>>> a = u'ć'
Unsupported characters in input
>>>

The Windows console is limited in what it can display. You can change the code page using the old DOS CHCP command.
CHCP 65001
This will change the code page to UTF-8, and make the console more relaxed. You will probably see a square instead of the actual character, but at least you won't see an error.

Try to:
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
...

Python pipe cp1252 string from PowerShell to a python (2.7) script

After a few days of dwelling over stackoverflow and python 2.7 doc, I have come to no conclusion about this.
Basically I'm running a python script on a windows server that must have as input a block of text. This block of text (unfortunately) has to be passed by a pipe. Something like:
PS > [something_that_outputs_text] | python .\my_script.py
So the problem is:
The server uses cp1252 encoding and I really cannot change it due to administrative regulations and whatnot. And when I pipe the text to my python script, when I read it, it comes already with ? whereas characters like \xe1 should be.
What I have done so far:
Tested with UTF-8. Yep, chcp 65001 and $OutputEncoding = [Console]::OutputEncoding "solve it", as in python gets the text perfectly and then I can decode it to unicode etc. But apparently they don't let me do it on the server /sadface.
A little script to test what the hell is happening:
import codecs
import sys
def main(argv=None):
if argv is None:
argv = sys.argv
if len(argv)>1:
for arg in argv[1:]:
print arg.decode('cp1252')
sys.stdin = codecs.getreader('cp1252')(sys.stdin)
text = sys.stdin.read().strip()
print text
return 0
if __name__=="__main__":
sys.exit(main())
Tried it with both the codecs wrapping and without it.
My input & output:
PS > echo "Blá" | python .\testinput.py blé
blé
Bl?
--> So there's no problem with the argument (blé) but the piped text (Blá) is no good :(
I even converted the text string to hex and, yes, it gets flooded with 3f (AKA mr ?), so it's not a problem with the print.
[Also: it's my first question here... feel free to ask any more info about what I did]
EDIT
I don't know if this is relevant or not, but when I do sys.stdin.encoding it yields None
Update: So... I have no problems with cmd. Checked sys.stdin.encoding while running the program on cmd and everything went fine. I think my head just exploded.

How about saving the data into a file and piping it to Python on a CMD session? Invoke Powershell and Python on CMD. Like so,
c:\>powershell -command "c:\genrateDataForPython.ps1 -output c:\data.txt"
c:\>type c:\data.txt | python .\myscript.py
Edit
Another an idea: convert the data into base64 format in Powershell and decode it in Python. Base64 is simple in Powershell, I guess in Python it isn't hard either. Like so,
# Convert some accent chars to base64
$s = [Text.Encoding]::UTF8.GetBytes("éêèë")
[System.Convert]::ToBase64String($s)
# Output:
w6nDqsOow6s=
# Decode:
$d = [System.Convert]::FromBase64String("w6nDqsOow6s=")
[Text.Encoding]::UTF8.GetString($d)
# Output
éêèë

Lost with encodings (shell and accents)

I'm having trouble with encodings.
I'm using version
Python 2.7.2+ (default, Oct 4 2011, 20:03:08)
[GCC 4.6.1] on linux2
I have chars with accents like é à.
My scripts uses utf-8 encoding
#!/usr/bin/python
# -*- coding: utf-8 -*-
Users can type strings usings raw_input() with .
def rlinput(prompt, prefill=''):
readline.set_startup_hook(lambda: readline.insert_text( prefill))
try:
return raw_input(prompt)
finally:
readline.set_startup_hook()
called in the main loop 'pseudo' shell
while to_continue :
to_continue, feedback = action( unicode(rlinput(u'todo > '),'utf-8') )
os.system('clear')
print T, u"\n" + feedback
Data are stored as pickle in files.
I managed to have the app working but finaly get stupid things like
core file :
class Task()
...
def __str__(self):
r = (u"OK" if self._done else u"A faire").ljust(8) + self.getDesc()
return r.encode('utf-8')
and so in shell file :
feedback = jaune + str(t).decode('utf-8') + vert + u" supprimée"
That's where i realize that i might be totaly wrong with encoding/decoding.
So I tried to decode directly in rlinput but failed.
I read some post in stackoverflow, re-read http://docs.python.org/library/codecs.html
Waiting for my python book, i'm lost :/
I guess there is a lot of bad code but my question here is only related to encoding issus.
You can find the code here : (most comments in french, sorry that's for personnal use and i'm a beginner, you'll also need yapsy - http://yapsy.sourceforge.net/ ) (then configure paths, then in py_todo : ./todo_shell.py) : http://bit.ly/rzp9Jm

Standard input and output are byte-based on all Unix systems. That's why you have to call the unicode function to get character-strings for them. The decode error indicates that the bytes coming in are not valid UTF-8.
Basically, the problem is the assumption of UTF-8 encoding, which is not guaranteed. Confirm this by changing the encoding in your unicode call to 'ISO-8859-1', or by changing the character encoding of your terminal emulator to UTF-8. (Putty supports this, in the "Translation" menu.)
If the above experiment confirms this, your challenge is to support the locale of the user and deduce the correct encoding, or perhaps to make the user declare the encoding in a command line argument or configuration. The $LANG environment variable is about the best you can do without an explicit declaration, and I find it to be a poor indicator of the desired character encoding.

As #wberry suggested i checked the encodings : ok
$ file --mime-encoding todo_shell.py task.py todo.py
todo_shell.py: utf-8
task.py: utf-8
todo.py: utf-8
$ echo $LANG
fr_FR.UTF-8
$ python -c "import sys; print sys.stdin.encoding"
UTF-8
As #eryksun suggested as decoded the user inputs (+encode the strings submited before) (solved some problems if my memory is good) (will test deeply later) :
def rlinput(prompt, prefill=''):
readline.set_startup_hook(lambda: readline.insert_text( prefill.encode(sys.stdin.encoding) ))
try:
return raw_input( prompt ).decode( sys.stdin.encoding )
finally:
readline.set_startup_hook()
I still hav problems but my question was not well defined so i can't get clear answer.
I feel less lost now and have directions to search.
Thank you !
EDIT : i replaced the str methodes with unicode and it killed some (all?) probs.
Thanks #eryksun for the tips. (this links helped me : Python __str__ versus __unicode__ )

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.