to_bytes function returning odd value - python

I am having trouble with the to_bytes function of integer type. The values bigger than 18200000 are giving me a weird byte array as an output.
I am using python 3.5 on raspberry pi. The value is not exactly 18200000 but close.
The way I call the function is like this:
frequency = 20000000
print(frequency.to_bytes(7,byteorder='big'))
The expected result would be b'\x01\x31\x2D\x00'
What I get is b'\x011-\x00'.

For bytes that are printable ASCII characters, python will display the corresponding character. \x31 is the character 1, and \x2D is -.

Related

How does ord() convert bytes to int

I'm using PySerial and trying to receive a byte array {1,2,3,4,5,6,7,8,9,10,11} sent from a MCU. Here's the array I get from PySerial
b'\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b
By looking at the first few "elements" (and the last one as well), I first thought they are just hex numbers until I saw 't' and 'n'.
So I tried to see the output of ord(b'\t') and it indeed gives me the int number 9. I'm a bit confused since ord() is supposed to return the unicode.
Why is 9 represented as b'\t' and 10 as b'\n'? What is this representation and can I find like a conversion table anywhere?
Thank you!

Python: different byte values for the same character?

The program I'm writing captures individual keypresses with the function mscvrt.getch(), which works very similarly to the C function of the same name, but instead of returning a char variable, it returns a byte, which I have to decode afterwards.
However, it has a problem decoding non-ascii characters, like accented letters (it triggers a UnicodeDecodeError), so I handle this exception with a function that compares the returned byte value with a list of byte values of special characters I want, and if it matches with one of them, the function returns its char equivalent.
The problem is that I noticed that the byte value is different on two machines I use (probably something to do with the system being in different languages, and/or I using keyboards with a different layout).
For example, if I input the character à, the byte value returned will be b'\x85' in one machine, and b'\xe0' in the other.
Why does this happen? How can I make a "universal solution" (elegant, preferably) that can work as I want in any machine?
Use msvcrt.getwch().
It will return a str (rather than a byte) that contains the character, and works with unicode rather than ascii.

Python: Converting special characters into operable integers?

I am currently working on a really simple encryption project algorithm to show basic understanding of how encryption works, and my encryption algorithm basically just uses the 'ord()' function for converting standard ASCII characters into integers that the algorithm can work on.
The problem I have run into is that I also need my program to be capable of encrypting, for example, the contents of a Windows executable (EXE) file. To do so, I need to convert all sorts of special characters (Not ASCII) into integers that I can operate off of.
I don't know a whole lot about encoding, but from what I understand, 'ord()' only works because there is a ASCII character map that has a corresponding number for each character. I couldn't seem to figure how to convert the special characters of an EXE file straight to integers, so I tried converting to bytes which seems a little more universal to me (please correct me if I am wrong).
At this point, I am just looking for a solution to be able to read an EXE file, and convert each character into a number specific to that character (for encryption/ decryption purposes).
You are confusing the meaning assigned to bytes (like the ASCII standard) with the bytes themselves. ord() just gives you the numerical value for a given byte. That Python interprets those bytes and shows you ASCII codepoints is neither here nor there.
In other words, ord() doesn't have to consult an ASCII table and can handle any byte value. All it has to do is take the already known byte value and give you a Python int object for it.
Read your data as binary (open the file with b added to the file mode), and use ord(). In Python 2, that'll result in str objects, and each character in such an object is really a byte value in the range 0 - 255.
Note that if you are using Python 3, reading from a file in binary mode results in a bytes object that makes it clearer still that these are integer values in a range:
>>> b'abc'
b'abc'
>>> b'abc'[0]
97
Indexing to an individual point in a bytes object produces the integer value and no call to ord() is required.

What's the point of chr(128) .. chr(255) in Python?

Edit: I'm talking about behavior in Python 2.7.
The chr function converts integers between 0 and 127 into the ASCII characters. E.g.
>>> chr(65)
'A'
I get how this is useful in certain situations and I understand why it covers 0..127, the 7-bit ASCII range.
The function also takes arguments from 128..255. For these numbers, it simply returns the hexadecimal representation of the argument. In this range, different bytes mean different things depending on which part of the ISO-8859 standard is used.
I'd understand if chr took another argument, e.g.
>>> chr(228, encoding='iso-8859-1') # hypothetical
'ä'
However, there is no such option:
chr(i) -> character
Return a string of one character with ordinal i; 0 <= i < 256.
My questions is: What is the point of raising ValueError for i > 255 instead of i > 127? All the function does for 128 <= i < 256 is return hex values?
In Python 2.x, a str is a sequence of bytes, so chr() returns a string of one byte and accepts values in the range 0-255, as this is the range that can be represented by a byte. When you print the repr() of a string with a byte in the range 128-255, the character is printed in escape format because there is no standard way to represent such characters (ASCII defines only 0-127). You can convert it to Unicode using unicode() however, and specify the source encoding:
unicode(chr(200), encoding="latin1")
In Python 3.x, str is a sequence of Unicode characters and chr() takes a much larger range. Bytes are handled by the bytes type.
I see what you're saying but it isn't correct. In Python 3.4 chr is documented as:
Return the string representing a character whose Unicode codepoint is the integer i.
And here are some examples:
>>> chr(15000)
'㪘'
>>> chr(5000)
'ᎈ'
In Python 2.x it was:
Return a string of one character whose ASCII code is the integer i.
The function chr has been around for a long time in Python and I think the understanding of various encodings only developed in recent releases. In that sense it makes sense to support the basic ASCII table and return hex values for the extended ASCII set within the 128 - 255 range.
Even within Unicode the ASCII set is only defined as 128 characters, not 256, so there isn't (wasn't) a standard and accepted way of letting ord() return an answer for those input values.
Note that python 2 string handling is broken. It's one of the reasons I recommend switching to python 3.
In python 2, the string type was designed to represent both text and binary strings. So, chr() is used to convert an integer to a byte. It's not really related to text, or ASCII, or ISO-8859-1. It's a binary stream of bytes:
binary_command = chr(100) + chr(200) + chr(10)
device.write(binary_command)
etc()
In python 2.7, the bytes() type was added for forward compatibility with python 3 and it maps to str().

How do I create unicode characters with variable numbers?

Basically what I want to do is print u'\u1001', but I do not want the 1001 hardcoded. Is there a way I can use a variable or string for this? Also, it would be nice if I could retrieve this code again, when I use the output as input.
According to the python doc on unicode (located Here):
One-character Unicode strings can also be created with the unichr()
built-in function, which takes integers and returns a Unicode string
of length 1 that contains the corresponding code point. The reverse
operation is the built-in ord() function that takes a one-character
Unicode string and returns the code point value:
unichr(40960)
results in the character :
u'\ua000'

Categories