Support more characters in Python 2.7 - python

Whenever I try to use the following characters in Python 2.7 "šđžćč" the console gives some non-ascii character error.
This is fixed by adding # -*- coding: utf-8 -*- to the header.
However when I try to use the characters this happends. Eg.
The code is print "Upiši svoj tekst:" but Upi┼íi svoj tekst: is printed.

Related

Python Encoding Comment Format

Originally, I've learned to specify the source code encoding in Python 2.7 this way:
# -*- coding: utf-8 -*-
Now I just noticed, that PEP263 also allows this:
# coding=utf-8
Is there any differences between these? What about editor compatiblity, cross-platform etc.?
What about Python 3? Is this comment still needed for python 3 or is any code in python 3 expected to be utf-8 by default?
Take a look at PEP3120 which changed the default encoding of python source code to be UTF-8
For python 3.x one therefore finds in the docs:
If a comment in the first or second line of the Python script matches
the regular expression coding[=:]\s*([-\w.]+), this comment is
processed as an encoding declaration [...] The recommended forms of an
encoding expression are:
# -*- coding: <encoding-name> -*-
which is recognized also by GNU Emacs, and
# vim:fileencoding=<encoding-name>
which is recognized by Bram Moolenaar’s VIM.
If no encoding declaration is found, the default encoding is UTF-8
The take home message is therefore:
python 3.x does not neccessarily need to have utf-8 specified, since it is the default
The way the coding line is written is to some degree personal choice (only a recommendation in the docs), it only has to match the regex.
Since Python 3 the default encoding is utf-8. You can still change the encoding using the special-formatted comment # -*- coding: <encoding name> -*-.
The docs recommend to use this coding expression as it is recognized also by GNU Emacs.
As python checks whether the first two lines are matching the regex coding[=:]\s*([-\w.]+), # coding=utf-8 works also to ensure utf-8 encoding but it is not recognized by GNU Emacs.

Print Urdu/Arabic Language in Console (Python)

I am a newbie and i don't know how to set my console to print urdu / arabic characters i am using Wing IDE when i run this code
print "طجکسعبکبطکسبطب"
i get this on my console
طجکسعبکبطکسبطب
You should encode your string arguments as unicode UTF-8 or later. Wrap the whole code in unicode, and/or mark individual string args as unicode (u'your text') too.
Additionally, you should make sure that unicode is enabled in your terminal/prompt window too.
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
arabic_words = u'لغت العربیه'
print arabic_words

why does the IDLE interprets and renders Tamil unicode characters improperly?

I have installed IDLE 3.4.0 on Ubuntu 14.04 and that works fine for English. I tried to print Tamil Unicode characters:
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
print ('\u0B85')
print('\u0BA4\u0BC1')
Which I thought would print Tamil unicode equivalent characters 'அ
' 'து'. I got the first one printed correctly but not the second one.
So I got output like this in here http://imgur.com/87y1jWE , I want to print them as I mentioned above without any crash to characters; how can I do that? The same issue exists with Python in the terminal.
After some searching I came to know that this is not a Python issue but a rendering issue. So how can I make IDLE render Unicode characters properly?

Printing a UTF char by its non-ASCII code in Python

I want to print a non-ASCII (UTF-8) by its code rather than the character itself using Python 2.7.
For example, I have the following:
# -*- coding: utf-8 -*-
print "…"
and that's OK. However, I want to print '…' using '\xe2', the corresponding code, instead.
Any ideas?
printing '\xe2\x80\xa6' will give you ...
In [36]: print'\xe2\x80\xa6'
…
In [45]: print repr("…")
'\xe2\x80\xa6'

What's the difference between 'coding=utf8' and '-*- coding: utf-8 -*-'?

Is there any difference between using
#coding=utf8
and
# -*- coding: utf-8 -*-
What about
# encoding: utf-8
There is no difference; Python recognizes all 3. It looks for the pattern:
coding[:=]\s*([-\w.]+)
on the first two lines of the file (which also must start with a #).
That's the literal text 'coding', followed by either a colon or an equals sign, followed by optional whitespace. Any word, dash or dot characters following that pattern are read as the codec.
The -*- is an Emacs-specific syntax; letting the text editor know what encoding to use. It makes the comment useful to two tools. VIM supports similar syntax.
See PEP 263: Defining Python Source Code Encodings.

Categories