print "Düsseldorf" with Python - python

I am trying to use the string Düsseldorf. When I do that :
# -*- coding: utf-8 -*-
print "Düsseldorf"
it prints strange characters. Could anyone help me please ?
Thank you very much.

>>> print u"Düsseldorf"
Düsseldorf
"Unicode In Python, Completely Demystified"

Most likely, your editor is not set to produce UTF-8 output. Setting it to output UTF-8 should fix the problem.
Alternatively, use unicode escapes:
print u"D\u00FCsseldorf"
Note that string literals in Python 2.x should be prefixed with a u(for unicode). Unprefixed literals(like "Düsseldorf") generate str objects which are byte arrays (despite the name), not strings. Therefore, in Python 2.x with a correctly configured editor, you want:
print u"Düsseldorf"
In Python 3.x, the situation has been rectified by letting str objects represent, well, strings, and introducing the bytes type for byte arrays, as in b'D\xc3\xbcsseldorf'.

Related

How can I print characters like ♟ in python

I am trying to print a clean chess board in python 2.7 that uses unique characters such as ♟.
I have tried simply replacing a value in a string ("g".replace(g, ♟)) but it is changed to '\xe2\x80\xa6'. If I put the character into an online ASCII converter, it returns "226 153 159"
♟ is a unicode character. In python 2, str holds ascii strings or binary data, while unicode holds unicode strings. When you do "♟" you get a binary encoded version of the unicode string. What that encoding is depends on the editor/console you used to type it in. Its common (and I think preferred) to use UTF-8 to encode strings but you may find that Windows editors favor little-endian UTF-16 strings.
Either way, you want to write your strings as unicode as much as possible. You can do some mix-and-matching between str and unicode but make sure anything outside of the ASCII code set is unicode from the beginning.
Python can take an encoding hint at the front of the file. So, assuming you use a UTF-8 editor, you can do
!#/usr/bin/env python
# -*- coding: utf-8 -*-
chess_piece = u"♟"
print u"g".replace(u"g", chess_piece)

python u'\u00b0' returns u'\xb0'. Why?

I use python 2.7.10.
On dealing with character encoding, and after reading a lot of stack-overflow etc. etc. on the subject, I encountered this behaviour which looks strange to me. Python interpreter input
>>>u'\u00b0'
results in the following output:
u'\xb0'
I could repeat this behaviour using a dos window, the idle console, and the wing-ide python shell.
My assumptions (correct me if I am wrong):
The "degree symbol" has unicode 0x00b0, utf-8 code 0xc2b0, latin-1 code 0xb0.
Python doc say, a string literal with u-prefix is encoded using unicode.
Question: Why is the result converted to a unicode-string-literal with a byte-escape-sequence which matches the latin-1 encoding, instead of persisting the unicode escape sequence ?
Thanks in advance for any help.
Python uses some rules for determining what to output from repr for each character. The rule for Unicode character codepoints in the 0x0080 to 0x00ff range is to use the sequence \xdd where dd is the hex code, at least in Python 2. There's no way to change it. In Python 3, all printable characters will be displayed without converting to a hex code.
As for why it looks like Latin-1 encoding, it's because Unicode started with Latin-1 as the base. All the codepoints up to 0xff match their Latin-1 counterpart.

Change the default encoding for automatic str to unicode conversion

When doing the following concatenation:
a = u'Hello there '
b = 'pirate ®'
c = a + b # This will raise UnicodeDecodeError
in python 2, 'pirate ®' is automatically converted to unicode type through ascii encoding. And since there is a non-ascii unicode sequence (®) in the string, it will fail.
Is there a way to change this default encoding to utf8?
It is possible, although it's considered a hack. You have to reload sys:
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
See this blog post for some explanation of the potential issues this raises:
http://blog.startifact.com/posts/older/changing-the-python-default-encoding-considered-harmful.html
It may be the only option you have, but you should be aware that it can lead to further problems. Which is why it's not a simple and easy thing to set.
From the Python Unicode Howto:
Ideally, you’d want to be able to write literals in your language’s natural encoding. You could then edit Python source code with your favorite editor which would display the accented characters naturally, and have the right characters used at runtime.
Python supports writing Unicode literals in any encoding, but you have to declare the encoding being used. This is done by including a special comment as either the first or second line of the source file:
#!/usr/bin/env python
# -*- coding: latin-1 -*-
u = u'abcdé'
print ord(u[-1])

Unsupported characters in input In Python IDLE

suffixes = {
1: ["ो", "े", "ू", "ु", "ी", "ि", "ा"]}
When I done
message given by IDLE is
Unsupported characters in input
Also not see the proper font in MS-DOS.
What encoding is your source file in?
If it is UTF8, put the comment
# -*- coding: utf-8 -*-
at the top of the file.
If you don't declare encoding in your first or second line in your python source file, then the python interpreter will use ASCII encoding system to decode the characters in the file. As these characters you used couldn't be decoded by ASCII encoding system, errors happended.
The solution is as #RemcoGerlich said. Here is the doc.
The encoding is used for all lexical analysis, in particular to find the end of a string, and to interpret the contents of Unicode literals. String literals are converted to Unicode for syntactical analysis, then converted back to their original encoding before interpretation starts. The encoding declaration must appear on a line of its own.
This seems to be a known bug in the 2.x IDLE console: http://bugs.python.org/issue15809. A fix was made for Python 3.x, but doesn't appear to be backported.
Instead, use an alternative console, such as iPython/Jupyter, or a fully-fledged IDE, such as PyCharm.

encoding in python: what type is the variable

Python file
# -*- coding: UTF-8 -*-
a = 'Köppler'
print a
print a.__class__.__name__
mydict = {}
mydict['name'] = a
print mydict
print mydict['name']
Output:
Köppler
str
{'name': 'K\xc3\xb6ppler'}
Köppler
It seems that the name remains the same, but only when printing a dictionary I get this strange escaped character string. What am I looking at then? Is that the UTF-8 representation?
The reason for that behavior is that the __repr__ function in Python 2 escapes non-ASCII unicode characters. As the link shows, this is fixed in Python 3.
Yes, that's the UTF-8 representation of ö (U+00F6 LATIN SMALL LETTER O WITH DIAERESIS). It consists of a 0xC3 octet followed by a 0xB6 octet. UTF-8 is a very elegant encoding, I think, and worth reading up on. The history of its design (on a placemat in a diner) is described here by Rob Pike.
As far as I'm concerned there are two methods in Python for displaying objects: str() and repr(). Str() is used internally inside print, however Apparently dict's str() uses repr() for keys and values.
As it has been mentioned: repr() escapes unicode characters.
It seems you are using python 2.x, where you have to specify that the object is actually a unicode string and not a plain ascii. You specified that the code is utf-8, thus you actually typed 2 bytes for your ö, and as it is a regular string, you got the 2 escaped chars.
Try to specify the unicode a= u'Köppler'. You may need to encode it before printing, depending on your consol encoding: print a.encode('utf-8')

Categories