Python chr() explain - python

So I am pretty sure this is a dumb question, but I am trying to get a deeper understanding of the python chr() function.
Also, I am wondering if it is possible to always have the integer argument three digits long, or just a fixed length for all ascii values?
chr(20) ## '\x14'
chr(020) ## '\x10'
Why is it giving me different answers? Does it think '020' is hex or something?
Also, I am running Python 2.7 on Windows!
-Thanks!

There is nothing to do with char. It is all about Numeric literals. And it is cross-language. 0 indicates oct and 0x indicates hex.
print 010 # 8
print 0x10 # 16

It makes sense to explain chr and ord together.
You are obviously using Python2 (because of the octal problem, Python3 requires 0o as the prefix), but I'll explain both.
In Python2, chr is a function that takes any integer up to 256 returns a string containing just that extended-ascii character. unichr is the same but returns a unicode character up to 0x10FFFF. ord is the inverse function, which takes a single-character string (of either type) and returns an integer.
In Python3, chr returns a single-character unicode string. The equivalent for byte strings is bytes([v]). ord still does both.

Related

Converting integer to 8-bit ASCII characters, NOT Unicode in Python 3

I've been working on a project where I'm encoding numbers as characters. Being used to C++, I assumed I could just use any 8bit number and cast it to a character. However, python's chr() function is returning Unicode characters, which aren't 8-bit, so that will not work.
I am new to Python and, from what I've read, previous versions used to have 2 separate functions: chr() for ASCII characters and unichr() for Unicode characters.
I am also limited to what I can get in the standard python library for windows (we are not allowed to install modules with pip).
This might usually be okay, but here's an example of when this can mess with my program:
If I'm encoding the integer 143:
# this is not taken from my actual code
num = 143
c = chr(143)
print(c)
I would expect this to print the ASCII character (a capital A with a little circle above it). Instead, I get the unicode \x8f, which represents "SS3" (Single Shift 3).
TL;DR: I'm converting 8-bit numbers to characters, but chr() converts to Unicode and I REALLY need a way to convert to ASCII instead, but I can't seem to find it in the standard library.
I know that this is such a simple problem and it's extremely frustrating to be stuck on this of all things.
Thanks a lot in advance!
Have a nice day!
- Vlad
"A with a little circle above it" is not an ASCII character, and 143 is outside the ASCII range (0-127).
It seems you are thinking in terms of the encoded bytes rather than unicode codepoints (which Python3 uses to represent string values). See here for 8 bit encodings where b'\x8f' represents 'Å‎'.
You probably want to do something like this:
import sys
c = 143
# Convert to byte
b = c.to_bytes(1, sys.byteorder)
# Decode to unicode (str) and print
print(b.decode('cp437'))
Å‎
You could also take a look at the struct package in the standard library, which deals with bytes and chars in a more "C-like" fashion.

Get the character that a Unicode code point corresponds

For a Computer Science class, we've got to make a python program that converts a character into it's Unicode Codepoint (the bin/hex number which is the reference to the character). Is there a function out there which can do this, like how the ord() function converts to ASCII and is there a function which does the reverse, turning a Unicode codepoint into a character?
Thanks
In Python3, if you know the unicode code point of a character, for example, 我 with Unicode code point \u6211, you can get the character with:
chr(0x6211)
The builtin function ord also works for unicode characters both in Python2 and Python3.
Python 3
>>> c='\U0010ffff'
>>> ord(c)
1114111
Python 2
>>> c=u'\U0010ffff'
>>> ord(c)
1114111
Difference between Python 2 and 3
The difference between Python 2 and Python 3 is when you go the other way around.
In Python 3, the function chr can take any code, ascii or unicode, and outputs the character.
In Python 2, the function chr is for extended ascii (code 0 to 255) and the function unichr is for unicode.
This is due to the fact that in Python 2, unicode and ascii strings were two different types.
Hexadecimal
If you need to get the character code in hexadecimal, you can use hex.
>>> hex(1114111)
'0x10ffff'
Binary
If you need to get the character in binary, you can use bin.
>>> bin(1114111)
'0b100001111111111111111'

What does the name of the ord() function stand for?

The official Python documentation explains ord(c)
ord(c):
Given a string representing one Unicode character, return an integer representing the Unicode code point of that character. For example, ord('a') returns the integer 97 and ord('€') (Euro sign) returns 8364. This is the inverse of chr().
It does not specify the meaning of ord, google searches are not helpful.
What's the origin of it?
It stands for "ordinal".
The earliest use of ord that I remember was in Pascal. There, ord() returned the ordinal value of its argument. For characters this was defined as the ASCII code.
The same convention was also used in Modula-2.
Later, Python (as well as PHP, some dialects of SQL etc) followed this convention, except that these days they're more likely to use Unicode rather than ASCII.
It could well be that the origins of the term (and the function name) go back further than Pascal.
Return the integer ordinal of a one-character string.
I took this from ord.doc in python command line. ord meaning ordinal of a one character.

What's the point of chr(128) .. chr(255) in Python?

Edit: I'm talking about behavior in Python 2.7.
The chr function converts integers between 0 and 127 into the ASCII characters. E.g.
>>> chr(65)
'A'
I get how this is useful in certain situations and I understand why it covers 0..127, the 7-bit ASCII range.
The function also takes arguments from 128..255. For these numbers, it simply returns the hexadecimal representation of the argument. In this range, different bytes mean different things depending on which part of the ISO-8859 standard is used.
I'd understand if chr took another argument, e.g.
>>> chr(228, encoding='iso-8859-1') # hypothetical
'ä'
However, there is no such option:
chr(i) -> character
Return a string of one character with ordinal i; 0 <= i < 256.
My questions is: What is the point of raising ValueError for i > 255 instead of i > 127? All the function does for 128 <= i < 256 is return hex values?
In Python 2.x, a str is a sequence of bytes, so chr() returns a string of one byte and accepts values in the range 0-255, as this is the range that can be represented by a byte. When you print the repr() of a string with a byte in the range 128-255, the character is printed in escape format because there is no standard way to represent such characters (ASCII defines only 0-127). You can convert it to Unicode using unicode() however, and specify the source encoding:
unicode(chr(200), encoding="latin1")
In Python 3.x, str is a sequence of Unicode characters and chr() takes a much larger range. Bytes are handled by the bytes type.
I see what you're saying but it isn't correct. In Python 3.4 chr is documented as:
Return the string representing a character whose Unicode codepoint is the integer i.
And here are some examples:
>>> chr(15000)
'㪘'
>>> chr(5000)
'ᎈ'
In Python 2.x it was:
Return a string of one character whose ASCII code is the integer i.
The function chr has been around for a long time in Python and I think the understanding of various encodings only developed in recent releases. In that sense it makes sense to support the basic ASCII table and return hex values for the extended ASCII set within the 128 - 255 range.
Even within Unicode the ASCII set is only defined as 128 characters, not 256, so there isn't (wasn't) a standard and accepted way of letting ord() return an answer for those input values.
Note that python 2 string handling is broken. It's one of the reasons I recommend switching to python 3.
In python 2, the string type was designed to represent both text and binary strings. So, chr() is used to convert an integer to a byte. It's not really related to text, or ASCII, or ISO-8859-1. It's a binary stream of bytes:
binary_command = chr(100) + chr(200) + chr(10)
device.write(binary_command)
etc()
In python 2.7, the bytes() type was added for forward compatibility with python 3 and it maps to str().

How do I create unicode characters with variable numbers?

Basically what I want to do is print u'\u1001', but I do not want the 1001 hardcoded. Is there a way I can use a variable or string for this? Also, it would be nice if I could retrieve this code again, when I use the output as input.
According to the python doc on unicode (located Here):
One-character Unicode strings can also be created with the unichr()
built-in function, which takes integers and returns a Unicode string
of length 1 that contains the corresponding code point. The reverse
operation is the built-in ord() function that takes a one-character
Unicode string and returns the code point value:
unichr(40960)
results in the character :
u'\ua000'

Categories