Printing a UTF char by its non-ASCII code in Python - python

I want to print a non-ASCII (UTF-8) by its code rather than the character itself using Python 2.7.
For example, I have the following:
# -*- coding: utf-8 -*-
print "…"
and that's OK. However, I want to print '…' using '\xe2', the corresponding code, instead.
Any ideas?

printing '\xe2\x80\xa6' will give you ...
In [36]: print'\xe2\x80\xa6'
…
In [45]: print repr("…")
'\xe2\x80\xa6'

Related

Print Urdu/Arabic Language in Console (Python)

I am a newbie and i don't know how to set my console to print urdu / arabic characters i am using Wing IDE when i run this code
print "طجکسعبکبطکسبطب"
i get this on my console
طجکسعبکبطکسبطب
You should encode your string arguments as unicode UTF-8 or later. Wrap the whole code in unicode, and/or mark individual string args as unicode (u'your text') too.
Additionally, you should make sure that unicode is enabled in your terminal/prompt window too.
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
arabic_words = u'لغت العربیه'
print arabic_words

Python string.title() issue with German umlauts

I got a strange behavior of the Python string.title() function if the string contains German umlauts (üöä). Then, not only the first character of the string is capitalized, but as well the character following the umlaut.
# -*- coding: utf-8 -*-
a = "müller"
print a.title()
# this returns >MüLler< , not >Müller< as expected
Tried to fix by setting locale to German UTF-8 charset, but no success:
import locale
locale.setlocale(locale.LC_ALL, 'de_DE.UTF-8')
a="müller"
print a.title()
# same value >MüLler<
Any ideas to prevent the capitalization after the umlaut?
My Python version is 2.6.6 on debian linux
Decode your string to Unicode, then use unicode.title():
>>> a = "müller"
>>> a.decode('utf8').title()
u'M\xfcller'
>>> print a.decode('utf8').title()
Müller
You can always encode to UTF-8 again later on.

Display unicode in python without print

I would like to display unicode characters without using print for example :
>>> print 'é'
é
The unicode is displayed perfectly but when I try to display without print it gives me unwanted results :
>>> 'é'
'\xc3\xa9'
And the expected result is 'é'
EDIT
The reason why I need this feature is, I m writing a scraper with scrapy framework, and I m crawling a website with unicode charachters, when I start crawling the log display something like this :
\u06a9\u06cc\u0644\u0648 \u0645\u062a\u0631 \u0628\u0631 \u0633\u0627\u0639\u062a\r\n\r\n
I've tried to use unicode built-in function, and I've added the header
# -*- coding: utf-8 -*-
But without any results
Python3 could be your solution, this version supports UTF-8 as default string encoding.
I'm not sure why you want to do this, but print statement converts objects given certain string conversion rules. You're seeing the value through conversion.
The expression is the raw return when you're experiencing the unicode.
https://docs.python.org/2/reference/simple_stmts.html#grammar-token-print_stmt

Support more characters in Python 2.7

Whenever I try to use the following characters in Python 2.7 "šđžćč" the console gives some non-ascii character error.
This is fixed by adding # -*- coding: utf-8 -*- to the header.
However when I try to use the characters this happends. Eg.
The code is print "Upiši svoj tekst:" but Upi┼íi svoj tekst: is printed.

Python: What's the equivalent of String[a:b] but for Unicode

So I have something like this:
x = "CЕМЬ"
x[:len(x)-1]
Which is to remove the last character from the string.
However it doesn't work and it gives me an error. I figured it's because it's Unicode. So how do you do this simple formatting on non-ansi strings.
That's because in Python 2.x "CЕМЬ", is a strange way of writing the byte string b'C\xd0\x95\xd0\x9c\xd0\xac'.
You want a character string. In Python 2.x, character strings are prefixed with a u:
x = u"CЕМЬ"
x[:-1] # Returns u"CЕМ" (len(x) is implicit for negative values)
If you're writing this in a program (as opposed to an interactive shell), you will want to specify a source code encoding. To do that, simply add the following line to the beginning of the file, where utf-8 matches your file encoding:
# -*- coding: utf-8 -*-
save the file with utf-8 encoding:
# -*- coding: utf-8 -*-
x = u'CЕМЬ'
print x[:-1] #prints CЕМ
x = u'some string'
x2 = x[:-1]

Categories