How does ord() convert bytes to int

How does ord() convert bytes to int - python

I'm using PySerial and trying to receive a byte array {1,2,3,4,5,6,7,8,9,10,11} sent from a MCU. Here's the array I get from PySerial
b'\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b
By looking at the first few "elements" (and the last one as well), I first thought they are just hex numbers until I saw 't' and 'n'.
So I tried to see the output of ord(b'\t') and it indeed gives me the int number 9. I'm a bit confused since ord() is supposed to return the unicode.
Why is 9 represented as b'\t' and 10 as b'\n'? What is this representation and can I find like a conversion table anywhere?
Thank you!

Related

How do I hash integers and strings inputs using murmurhash3

I'm looking to get a hash value for string and integer inputs.
Using murmurhash3, I'm able to do it for strings but not integers:
pip install murmurhash3
import mmh3
mmh3.hash(34)
Returns the following error:
TypeError: a bytes-like object is required, not 'int'
I could convert it to bytes like this:
mmh3.hash(bytes(34))
But then I'll get an error message if the input is string
How do I overcome this without converting the integer to string?

How do I overcome this without converting the integer to string?
You can't. Or more precisely, you need to convert it to bytes or str in some way, but it needn't be a human-readable text form like b'34'/'34'. A common approach on Python 3 would be:
my_int = 34 # Or some other value
my_int_as_bytes = my_int.to_bytes((my_int.bit_length() + 7) // 8, 'little')
which makes a minimalist raw bytes representation of the original int (regardless of length); for 34, you'd get b'"' (because it only takes one byte to store it, so you're basically getting a bytes object with its ordinal value), but for larger ints it still works (unlike mucking about with chr), and it's always as small as possible (getting 8 bits of data per byte, rather than a titch over 3 bits per byte as you'd get converting to a text string).
If you're on Python 2 (WHY?!? It's been end-of-life for nearly a year), int.to_bytes doesn't exist, but you can fake it with moderate efficiency in various ways, e.g. (only handling non-negative values, unlike to_bytes which handles signed values with a simple flag):
from binascii import unhexlify
my_int_as_bytes = unhexlify('%x' % (my_int,))

to_bytes function returning odd value

I am having trouble with the to_bytes function of integer type. The values bigger than 18200000 are giving me a weird byte array as an output.
I am using python 3.5 on raspberry pi. The value is not exactly 18200000 but close.
The way I call the function is like this:
frequency = 20000000
print(frequency.to_bytes(7,byteorder='big'))
The expected result would be b'\x01\x31\x2D\x00'
What I get is b'\x011-\x00'.

For bytes that are printable ASCII characters, python will display the corresponding character. \x31 is the character 1, and \x2D is -.

How can I extract mixed binary and ascii values from a bytes string like I did in 2.x?

The following represents a binary image extracted from a file (spaces inserted between bytes to make reading easier). File is opened with 'rb' mode.
01 77 33 9F 41 42 43 44 00 11 11 11
In Python 2.7, I read it as a character string and I use ord() to extract the binary values and then I can extract or even search the string for a specific text value (such as the "ABCD" in characters 4-7). The binary bytes can be anything from 0-FF. I've been putting off conversion to python 3 partly because of this.
I need to be able, in Python 3, to treat a string of bytes as a mixture of binary and ascii (not unicode) values. The format is not fixed, it consists of data structures. For example, the 33 in byte 2 might be a record length that tells me where the start of the next record is. In other words, I can't just say that I know the text string is always in location 4.
I don't write the file, I just use it, so changing it is not an option.
I've seen lots of examples of using b' and other things to convert fixed strings but I need a way to intermix these values, extracting bytes, 2-byte to 8-byte values as 16-bit to 64-bit words, and extracting/searching for ASCII strings within the larger string.
The byte/character separation in Python 3 seems somewhat inflexible for what I need. I'm sure there's a way to do this I just haven't found an example or an answered question that seems to cover this case.
This is a simplified example, I can't provide real data (it's proprietary) but this illustrates the problem. The real files may be short (<1K) or huge (>100K), containing multiple records of different sizes.
Is there an easy, straightforward way to essentially replicate the functionality I have in Python 2.7?
This is on Windows.
Thanks

I need to be able, in Python 3, to treat a string of bytes as a mixture of binary and ascii (not unicode) values. The format is not fixed, it consists of data structures. For example, the 33 in byte 2 might be a record length that tells me where the start of the next record is. In other words, I can't just say that I know the text string is always in location 4.
Read the file in binary mode, as you are doing. This produces a bytes object, which in 3.x is not the same as a str (as it would be in 2.x).
Interpret the bytes as bytes, as needed, to figure out the general structure of the data. Slicing the bytes produces another bytes as before; indexing produces an int with the numeric value of that single byte (not as before) - no ord required.
When you have determined a subset of the bytes that represent a string (let's say for convenience that you have sliced it out), convert to string using the appropriate encoding: e.g. str(my_bytes, 'ascii'). Note that ASCII will not handle byte values 0x80 through 0xFF; especially with binary-ish legacy file formats, there's a good chance your data is actually something like Latin-1: str(my_bytes, 'iso-8859-1').
search the string for a specific text value
You can search at either the text or the byte level - bytes objects support the in operator, searching for either a subsequence of bytes or a single integer value. Whether it makes more sense to search before or after string conversion will depend on what you are doing.
using b' and other things to convert fixed strings
b'' is just the syntax for a literal bytes object. It's what you'll see if you ask for the repr of what you read from the file. Prefixing a b onto an existing string literal in your code isn't really "converting" anything, but replacing it with the value you should have had in the first place.
2-byte to 8-byte values as 16-bit to 64-bit words
The documentation says it at least as well as I could:
>>> help(int.from_bytes)
Help on built-in function from_bytes:
from_bytes(...) method of builtins.type instance
int.from_bytes(bytes, byteorder, *, signed=False) -> int
Return the integer represented by the given array of bytes.
The bytes argument must be a bytes-like object (e.g. bytes or bytearray).
The byteorder argument determines the byte order used to represent the
integer. If byteorder is 'big', the most significant byte is at the
beginning of the byte array. If byteorder is 'little', the most
significant byte is at the end of the byte array. To request the native
byte order of the host system, use `sys.byteorder' as the byte order value.
The signed keyword-only argument indicates whether two's complement is
used to represent the integer.

Big Binary Code into File in Python

I have been working on a program and I have been trying to convert a big binary file (As a string) and pack it into a file. I have tried for days to make such thing possible. Here is the code I had written to pack the large binary string.
binaryRecieved="11001010101....(Shortened)"
f=open(fileName,'wb')
m=long(binaryRecieved,2)
struct.pack('i',m)
f.write(struct.pack('i',m))
f.close()
quit()
I am left with the error
struct.pack('i',x)
struct.error: integer out of range for 'i' format code
My integer is out of range, so I was wondering if there is a different way of going about with this.
Thanks

Convert your bit string to a byte string: see for example this question Converting bits to bytes in Python. Then pack the bytes with struct.pack('c', bytestring)

For encoding m in big-endian order (like "ten" being written as "10" in normal decimal use) use:
def as_big_endian_bytes(i):
out=bytearray()
while i:
out.append(i&0xff)
i=i>>8
out.reverse()
return out
For encoding m in little-endian order (like "ten" being written as "01" in normal decimal use) use:
def as_little_endian_bytes(i):
out=bytearray()
while i:
out.append(i&0xff)
i=i>>8
return out
both functions work on numbers - like you do in your question - so the returned bytearray may be shorter than expected (because for numbers leading zeroes do not matter).
For an exact representation of a binary-digit-string (which is only possible if its length is dividable by 8) you would have to do:
def as_bytes(s):
assert len(s)%8==0
out=bytearray()
for i in range(0,len(s)-8,8):
out.append(int(s[i:i+8],2))
return out

In struct.pack you have used 'i' which represents an integer number, which is limited. As your code states, you have a long output; thus, you may want to use 'd' in stead of 'i', to pack your data up as double. It should work.
See Python struct for more information.

Start at offset from beginning of file and unpack 4 byte word

With Python, given an offset say 250 bytes, how would I jump to this position in a file and store a 32bit binary value?
My issue is read() returns a string and I'm not sure if I'm able to properly advance to the valid offset having done that. Also, experimenting with struct.unpack() it's demanding a length equivalent to the specified format. How do I grab only the immediate following data according to what's expected of the specified format? And what's the format for a 32bit int?
Ex. I wrote a string >32 characters and thought I could grab the initial 32 bits and store them as a single 32 bit int by using '<qqqq', this was incorrect needless to say.

with open("input.bin","rb") as f:
f.seek(250) #offset
print struct.unpack("<l",f.read(4)) #grabs one little endian 32 bit long
if you wanted 4, 32 bit ints you would use
print struct.unpack("<llll",f.read(16))
if you just want to grab the next 32bit int
print struct.unpack_from("<l",f)[0]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.