The official Python documentation explains ord(c)
ord(c):
Given a string representing one Unicode character, return an integer representing the Unicode code point of that character. For example, ord('a') returns the integer 97 and ord('€') (Euro sign) returns 8364. This is the inverse of chr().
It does not specify the meaning of ord, google searches are not helpful.
What's the origin of it?
It stands for "ordinal".
The earliest use of ord that I remember was in Pascal. There, ord() returned the ordinal value of its argument. For characters this was defined as the ASCII code.
The same convention was also used in Modula-2.
Later, Python (as well as PHP, some dialects of SQL etc) followed this convention, except that these days they're more likely to use Unicode rather than ASCII.
It could well be that the origins of the term (and the function name) go back further than Pascal.
Return the integer ordinal of a one-character string.
I took this from ord.doc in python command line. ord meaning ordinal of a one character.
Related
Using Python, how would I encode all the characters of a string to a URL-encoded string?
As of now, just about every answer eventually references the same methods, such as urllib.parse.quote() or urllib.parse.urlencode(). While these answers are technically valid (they follow the required specifications for encoding special characters in URLs), I have not managed to find a single answer that describes how to encode other/non-special characters as well (such as lowercase or uppercase letters).
How do I take a string and encode every character into a URL-encoded string?
This gist reveals a very nice answer to this problem. The final function code is as follows:
def encode_all(string):
return "".join("%{0:0>2}".format(format(ord(char), "x")) for char in string)
Let's break this down.
The first thing to notice is that the return value is a generator expression (... for char in string) wrapped in a str.join call ("".join(...)). This means we will be performing an operation for each character in the string, then finally joining each outputted string together (with the empty string, "").
The operation performed on each character in the string is "%{0:0>2}".format(format(ord(char), "x")). This can be broken down into the following:
ord(char): Convert each character to the corresponding number.
format(..., "x"): Convert the number to a hexadecimal value.
"%{0:0>2}".format(...): Format the hexadecimal value into a string (with a prefixed "%").
When you look at the whole function from an overview, it is converting each character to a number, converting that number to hexadecimal, then jamming all the hexadecimal values into a string (which is then returned).
I've been working on a project where I'm encoding numbers as characters. Being used to C++, I assumed I could just use any 8bit number and cast it to a character. However, python's chr() function is returning Unicode characters, which aren't 8-bit, so that will not work.
I am new to Python and, from what I've read, previous versions used to have 2 separate functions: chr() for ASCII characters and unichr() for Unicode characters.
I am also limited to what I can get in the standard python library for windows (we are not allowed to install modules with pip).
This might usually be okay, but here's an example of when this can mess with my program:
If I'm encoding the integer 143:
# this is not taken from my actual code
num = 143
c = chr(143)
print(c)
I would expect this to print the ASCII character (a capital A with a little circle above it). Instead, I get the unicode \x8f, which represents "SS3" (Single Shift 3).
TL;DR: I'm converting 8-bit numbers to characters, but chr() converts to Unicode and I REALLY need a way to convert to ASCII instead, but I can't seem to find it in the standard library.
I know that this is such a simple problem and it's extremely frustrating to be stuck on this of all things.
Thanks a lot in advance!
Have a nice day!
- Vlad
"A with a little circle above it" is not an ASCII character, and 143 is outside the ASCII range (0-127).
It seems you are thinking in terms of the encoded bytes rather than unicode codepoints (which Python3 uses to represent string values). See here for 8 bit encodings where b'\x8f' represents 'Å'.
You probably want to do something like this:
import sys
c = 143
# Convert to byte
b = c.to_bytes(1, sys.byteorder)
# Decode to unicode (str) and print
print(b.decode('cp437'))
Å
You could also take a look at the struct package in the standard library, which deals with bytes and chars in a more "C-like" fashion.
So I am pretty sure this is a dumb question, but I am trying to get a deeper understanding of the python chr() function.
Also, I am wondering if it is possible to always have the integer argument three digits long, or just a fixed length for all ascii values?
chr(20) ## '\x14'
chr(020) ## '\x10'
Why is it giving me different answers? Does it think '020' is hex or something?
Also, I am running Python 2.7 on Windows!
-Thanks!
There is nothing to do with char. It is all about Numeric literals. And it is cross-language. 0 indicates oct and 0x indicates hex.
print 010 # 8
print 0x10 # 16
It makes sense to explain chr and ord together.
You are obviously using Python2 (because of the octal problem, Python3 requires 0o as the prefix), but I'll explain both.
In Python2, chr is a function that takes any integer up to 256 returns a string containing just that extended-ascii character. unichr is the same but returns a unicode character up to 0x10FFFF. ord is the inverse function, which takes a single-character string (of either type) and returns an integer.
In Python3, chr returns a single-character unicode string. The equivalent for byte strings is bytes([v]). ord still does both.
Edit: I'm talking about behavior in Python 2.7.
The chr function converts integers between 0 and 127 into the ASCII characters. E.g.
>>> chr(65)
'A'
I get how this is useful in certain situations and I understand why it covers 0..127, the 7-bit ASCII range.
The function also takes arguments from 128..255. For these numbers, it simply returns the hexadecimal representation of the argument. In this range, different bytes mean different things depending on which part of the ISO-8859 standard is used.
I'd understand if chr took another argument, e.g.
>>> chr(228, encoding='iso-8859-1') # hypothetical
'ä'
However, there is no such option:
chr(i) -> character
Return a string of one character with ordinal i; 0 <= i < 256.
My questions is: What is the point of raising ValueError for i > 255 instead of i > 127? All the function does for 128 <= i < 256 is return hex values?
In Python 2.x, a str is a sequence of bytes, so chr() returns a string of one byte and accepts values in the range 0-255, as this is the range that can be represented by a byte. When you print the repr() of a string with a byte in the range 128-255, the character is printed in escape format because there is no standard way to represent such characters (ASCII defines only 0-127). You can convert it to Unicode using unicode() however, and specify the source encoding:
unicode(chr(200), encoding="latin1")
In Python 3.x, str is a sequence of Unicode characters and chr() takes a much larger range. Bytes are handled by the bytes type.
I see what you're saying but it isn't correct. In Python 3.4 chr is documented as:
Return the string representing a character whose Unicode codepoint is the integer i.
And here are some examples:
>>> chr(15000)
'㪘'
>>> chr(5000)
'ᎈ'
In Python 2.x it was:
Return a string of one character whose ASCII code is the integer i.
The function chr has been around for a long time in Python and I think the understanding of various encodings only developed in recent releases. In that sense it makes sense to support the basic ASCII table and return hex values for the extended ASCII set within the 128 - 255 range.
Even within Unicode the ASCII set is only defined as 128 characters, not 256, so there isn't (wasn't) a standard and accepted way of letting ord() return an answer for those input values.
Note that python 2 string handling is broken. It's one of the reasons I recommend switching to python 3.
In python 2, the string type was designed to represent both text and binary strings. So, chr() is used to convert an integer to a byte. It's not really related to text, or ASCII, or ISO-8859-1. It's a binary stream of bytes:
binary_command = chr(100) + chr(200) + chr(10)
device.write(binary_command)
etc()
In python 2.7, the bytes() type was added for forward compatibility with python 3 and it maps to str().
In C, I can do that: printf("%c%c",-108,109);. But how to do it in Python?
args[2]='Test%c%cFile'% (-108,-109)
OverflowError: unsigned byte integer is less than minimum
From the comments, it seems you want to put some non-ASCII characters into a string. Hacks involving negative integers are not necessary in Python, which has perfectly good Unicode strings for holding any character you could dream of. Use a code chart (like this one) to find the code points for your characters, then insert them into your string using the \uxxxx syntax.
Edit: You seem to be after U+9493 CJK UNIFIED IDEOGRAPH-9493, which can be inserted into a string using
str = u"\u9493"
(Thanks #KeithThompson!)
The real answer is that you have been relying on non standard behavior in your C code (one of the language lawyers can butt in and tell you if it's truly undefined or not).
You need to look into how to properly support non ASCII, multibyte character sets in python rather than just trying to duplicate your incorrect C code.
There are plenty of resources that will help with correct "i18n" of an app.