Why i cannot restore the packed data when i use bytes? - python

import struct
x = struct.pack('i', 10)
print(x)
result:
b'\n\x00\x00\x00'
if i want to convert b'\n\x00\x00\x00' to and integer it works:
bytes_to_convert = b'\n\x00\x00\x00'
bytes_converter = int.from_bytes(bytes_to_convert, byteorder='little', signed=False)
print(bytes_converter)
result:
10
everything's working perfectly but when it comes to bytes() function :
print(bytes('10', 'utf-16'))
if i want to convert it to an integer:
i get this value : 206161706751
Where is the difference between using bytes() and using struct to pack data into bytes i used i in struct means 4 bytes and bytes is using 4 bytes what is the difference ?

10 is a number ("ten"), which is stored in your computer as 00000000 00001010
'10' is a string, consisting of symbols 1 and 0. It is stored as 00110000 00110001.
From the computer's standpoint, there's nothing in common between these two pieces of data. Yes, the string '10' happens to be a name for the number ten (in our language, at least), but this connection only exists in our brain. The computer is not aware of it.

Related

Convert bytes string literal to integer [duplicate]

This question already has answers here:
Convert bytes to int?
(7 answers)
Closed 8 months ago.
I receive a 32-bit number over the serial line, using num = ser.read(4). Checking the value of num in the shell returns something like a very unreadable b'\xcbu,\x0c'.
I can check against the ASCII table to find the values of "u" and ",", and determine that the hex value of the received number is actually equal to "cb 75 2c 0c", or in the format that Python outputs, it's b'\xcb\x75\x2c\x0c'. I can also type the value into a calculator and convert it to decimal (or run int(0xcb752c0c) in Python), which returns 3413453836.
How can I do this conversion from a binary string literal to an integer in Python?
I found two alternatives to solve this problem.
Using the int.from_bytes(bytes, byteorder, *, signed=False) method
Using the struct.unpack(format, buffer) from the builtin struct module
Using int.from_bytes
Starting from Python 3.2, you can use int.from_bytes.
Second argument, byteorder, specifies endianness of your bytestring. It can be either 'big' or 'little'. You can also use sys.byteorder to get your host machine's native byteorder.
from the docs:
The byteorder argument determines the byte order used to represent the integer. If byteorder is "big", the most significant byte is at the beginning of the byte array. If byteorder is "little", the most significant byte is at the end of the byte array. To request the native byte order of the host system, use sys.byteorder as the byte order value.
int.from_bytes(bytes, byteorder, *, signed=False)
Code applying in your case:
>>> import sys
>>> int.from_bytes(b'\x11', byteorder=sys.byteorder)
17
>>> bin(int.from_bytes(b'\x11', byteorder=sys.byteorder))
'0b10001'
Here is the official demonstrative code from the docs:
>>> int.from_bytes(b'\x00\x10', byteorder='big')
16
>>> int.from_bytes(b'\x00\x10', byteorder='little')
4096
>>> int.from_bytes(b'\xfc\x00', byteorder='big', signed=True)
-1024
>>> int.from_bytes(b'\xfc\x00', byteorder='big', signed=False)
64512
>>> int.from_bytes([255, 0, 0], byteorder='big')
16711680
Using the struct.unpack method
The function you need to achieve your goal is struct.unpack.
To understand where you can use it, we need to understand the parameters to give and their impact on the result.
struct.unpack(format, buffer)
Unpack from the buffer buffer (presumably packed by pack(format, ...)) according to the format string format. The result is a tuple even if it contains exactly one item. The buffer’s size in bytes must match the size required by the format, as reflected by calcsize().
The buffer is the bytes that we have to give and the format is how the bytestring should be read.
The information will be split into a string, format characters, that can be endianness, ctype, bytesize, ecc..
from the docs:
Format characters have the following meaning; the conversion between C and Python values should be obvious given their types. The ‘Standard size’ column refers to the size of the packed value in bytes when using standard size; that is, when the format string starts with one of '<', '>', '!' or '='. When using native size, the size of the packed value is platform-dependent.
This table represents the format characters currently avaiable in Python 3.10.6:
Format
C-Type
Standard Size
x
pad byte
c
char
1
b
signed char
1
B
uchar
1
?
bool
1
h
short
2
H
ushort
2
i
int
4
I
uint
4
l
long
4
L
ulong
4
q
long long
8
Q
unsigned long long
8
n
ssize_t
N
unsigned ssize_t
f
float
d
double
s
char[]
p
char[]
P
void*
and here is a table to use it to correct byte order:
Character
Byte order
Size
Alignment
#
native
native
Native
=
native
standard
None
<
little-endian
standard
None
>
big-endian
standard
None
!
network (= big-endian)
standard
None
Examples
Here is an example how you can use it:
>>> import struct
>>> format_chars = '<i' #4 bytes, endianness is 'little'
>>> struct.unpack(format_chars,b"f|cs")
(1935899750,)
check the builtin struct module.
https://docs.python.org/3/library/struct.html
in your case, it should probably be something like:
import struct
struct.unpack(">I", b'\xcb\x75\x2c\x0c')
but it depends on Endianness and signed/unsigned, so do read the entire doc.

Pack into c types and obtain the binary value back

I'm using the following code to pack an integer into an unsigned short as follows,
raw_data = 40
# Pack into little endian
data_packed = struct.pack('<H', raw_data)
Now I'm trying to unpack the result as follows. I use utf-16-le since the data is encoded as little-endian.
def get_bin_str(data):
bin_asc = binascii.hexlify(data)
result = bin(int(bin_asc.decode("utf-16-le"), 16))
trimmed_res = result[2:]
return trimmed_res
print(get_bin_str(data_packed))
Unfortunately, it throws the following error,
result = bin(int(bin_asc.decode("utf-16-le"), 16)) ValueError: invalid
literal for int() with base 16: '㠲〰'
How do I properly decode the bytes in little-endian to binary data properly?
Use unpack to reverse what you packed. The data isn't UTF-encoded so there is no reason to use UTF encodings.
>>> import struct
>>> data_packed = struct.pack('<H', 40)
>>> data_packed.hex() # the two little-endian bytes are 0x28 (40) and 0x00 (0)
2800
>>> data = struct.unpack('<H',data_packed)
>>> data
(40,)
unpack returns a tuple, so index it to get the single value
>>> data = struct.unpack('<H',data_packed)[0]
>>> data
40
To print in binary use string formatting. Either of these work work best. bin() doesn't let you specify the number of binary digits to display and the 0b needs to be removed if not desired.
>>> format(data,'016b')
'0000000000101000'
>>> f'{data:016b}'
'0000000000101000'
You have not said what you are trying to do, so let's assume your goal is to educate yourself. (If you are trying to pack data that will be passed to another program, the only reliable test is to check if the program reads your output correctly.)
Python does not have an "unsigned short" type, so the output of struct.pack() is a byte array. To see what's in it, just print it:
>>> data_packed = struct.pack('<H', 40)
>>> print(data_packed)
b'(\x00'
What's that? Well, the character (, which is decimal 40 in the ascii table, followed by a null byte. If you had used a number that does not map to a printable ascii character, you'd see something less surprising:
>>> struct.pack("<H", 11)
b'\x0b\x00'
Where 0b is 11 in hex, of course. Wait, I specified "little-endian", so why is my number on the left? The answer is, it's not. Python prints the byte string left to right because that's how English is written, but that's irrelevant. If it helps, think of strings as growing upwards: From low memory locations to high memory. The least significant byte comes first, which makes this little-endian.
Anyway, you can also look at the bytes directly:
>>> print(data_packed[0])
40
Yup, it's still there. But what about the bits, you say? For this, use bin() on each of the bytes separately:
>>> bin(data_packed[0])
'0b101000'
>>> bin(data_packed[1])
'0b0'
The two high bits you see are worth 32 and 8. Your number was less than 256, so it fits entirely in the low byte of the short you constructed.
What's wrong with your unpacking code?
Just for fun let's see what your sequence of transformations in get_bin_str was doing.
>>> binascii.hexlify(data_packed)
b'2800'
Um, all right. Not sure why you converted to hex digits, but now you have 4 bytes, not two. (28 is the number 40 written in hex, the 00 is for the null byte.) In the next step, you call decode and tell it that these 4 bytes are actually UTF-16; there's just enough for two unicode characters, let's take a look:
>>> b'2800'.decode("utf-16-le")
'㠲〰'
In the next step Python finally notices that something is wrong, but by then it does not make much difference because you are pretty far away from the number 40 you started with.
To correctly read your data as a UTF-16 character, call decode directly on the byte string you packed.
>>> data_packed.decode("utf-16-le")
'('
>>> ord('(')
40

python - pack exactly 24 bits into unsigned/signed int

I need to send exactly 24 bits over an ethernet connection, and the program on the other end expects an unsigned int in some cases and a signed int in others (C types). I want to use the struct class, but it doesn't have a type with 3 bytes built in (like uint24_t).
Similar questions to this have been asked, but the answer always involves sending 4 bytes and padding the data packet with zeros. I cannot do this, however, since I am not writing the program which is receiving the data, and it expects exactly 24 bits.
I am very new at this type of programming, so help is appreciated!
Using the struct module, create a string that contains exactly three 8-bit bytes.
import struct
# 24 bits: 01010101 10101010 11110000
byte1 = 0x55
byte2 = 0xaa
byte3 = 0xf0
data = struct.pack("BBB", byte1, byte2, byte3)
Depending on how you get the bits to send, you can define the string directly:
data = '\x55\xaa\xf0'

How can I convert two bytes of an integer back into an integer in Python?

I am currently using an Arduino that's outputting some integers (int) through Serial (using pySerial) to a Python script that I'm writing for the Arduino to communicate with X-Plane, a flight simulation program.
I managed to separate the original into two bytes so that I could send it over to the script, but I'm having a little trouble reconstructing the original integer.
I tried using basic bitwise operators (<<, >> etc.) as I would have done in a C++like program, but it does not seem to be working.
I suspect it has to do with data types. I may be using integers with bytes in the same operations, but I can't really tell which type each variable holds, since you don't really declare variables in Python, as far as I know (I'm very new to Python).
self.pot=self.myline[2]<<8
self.pot|=self.myline[3]
You can use the struct module to convert between integers and representation as bytes. In your case, to convert from a Python integer to two bytes and back, you'd use:
>>> import struct
>>> struct.pack('>H', 12345)
'09'
>>> struct.unpack('>H', '09')
(12345,)
The first argument to struct.pack and struct.unpack represent how you want you data to be formatted. Here, I ask for it to be in big-ending mode by using the > prefix (you can use < for little-endian, or = for native) and then I say there is a single unsigned short (16-bits integer) represented by the H.
Other possibilities are b for a signed byte, B for an unsigned byte, h for a signed short (16-bits), i for a signed 32-bits integer, I for an unsigned 32-bits integer. You can get the complete list by looking at the documentation of the struct module.
For example, using Big Endian encoding:
int.from_bytes(my_bytes, byteorder='big')
What you have seems basically like it should work, assuming the data stored in myline has the high byte first:
myline = [0, 1, 2, 3]
pot = myline[2]<<8 | myline[3]
print 'pot: {:d}, 0x{:04x}'.format(pot, pot) # outputs "pot: 515, 0x0203"
Otherwise, if it's low-byte first you'd need to do the opposite way:
myline = [0, 1, 2, 3]
pot = myline[3]<<8 | myline[2]
print 'pot: {:d}, 0x{:04x}'.format(pot, pot) # outputs "pot: 770, 0x0302"
This totally works:
long = 500
first = long & 0xff #244
second = long >> 8 #1
result = (second << 8) + first #500
If you are not sure of types in 'myline' please check Stack Overflow question How to determine the variable type in Python?.
To convert a byte or char to the number it represents, use ord(). Here's a simple round trip from an int to bytes and back:
>>> number = 3**9
>>> hibyte = chr(number / 256)
>>> lobyte = chr(number % 256)
>>> hibyte, lobyte
('L', '\xe3')
>>> print number == (ord(hibyte) << 8) + ord(lobyte)
True
If your myline variable is string or bytestring, you can use the formula in the last line above. If it somehow is a list of integers, then of course you don't need ord.

Convert little endian string to integer [duplicate]

This question already has answers here:
How to convert a string of bytes into an int?
(12 answers)
Closed 7 months ago.
I have read samples out of a wave file using the wave module, but it gives the samples as a string, it's out of wave so it's little endian (for example, \x00).
What is the easiest way to convert this into a python integer, or a numpy.int16 type? (It will eventually become a numpy.int16, so going directly there is fine).
Code needs to work on little endian and big endian processors.
The struct module converts packed data to Python values, and vice-versa.
>>> import struct
>>> struct.unpack("<h", "\x00\x05")
(1280,)
>>> struct.unpack("<h", "\x00\x06")
(1536,)
>>> struct.unpack("<h", "\x01\x06")
(1537,)
"h" means a short int, or 16-bit int. "<" means use little-endian.
struct is fine if you have to convert one or a small number of 2-byte strings to integers, but array and numpy itself are better options. Specifically, numpy.fromstring (called with the appropriate dtype argument) can directly convert the bytes from your string to an array of (whatever that dtype is). (If numpy.little_endian is false, you'll then have to swap the bytes -- see here for more discussion, but basically you'll want to call the byteswap method on the array object you just built with fromstring).
Kevin Burke's answer to this question works great when your binary string represents a single short integer, but if your string holds binary data representing multiple integers, you will need to add an additional 'h' for each additional integer that the string represents.
For Python 2
Convert Little Endian String that represents 2 integers
import struct
iValues = struct.unpack("<hh", "\x00\x04\x01\x05")
print(iValues)
Output: (1024, 1281)
Convert Little Endian String that represents 3 integers
import struct
iValues = struct.unpack("<hhh", "\x00\x04\x01\x05\x03\x04")
print(iValues)
Output: (1024, 1281, 1027)
Obviously, it's not realistic to always guess how many "h" characters are needed, so:
import struct
# A string that holds some unknown quantity of integers in binary form
strBinary_Values = "\x00\x04\x01\x05\x03\x04"
# Calculate the number of integers that are represented by binary string data
iQty_of_Values = len(strBinary_Values)/2
# Produce the string of required "h" values
h = "h" * int(iQty_of_Values)
iValues = struct.unpack("<"+h, strBinary_Values)
print(iValues)
Output: (1024, 1281, 1027)
For Python 3
import struct
# A string that holds some unknown quantity of integers in binary form
strBinary_Values = "\x00\x04\x01\x05\x03\x04"
# Calculate the number of integers that are represented by binary string data
iQty_of_Values = len(strBinary_Values)/2
# Produce the string of required "h" values
h = "h" * int(iQty_of_Values)
iValues = struct.unpack("<"+h, bytes(strBinary_Values, "utf8"))
print(iValues)
Output: (1024, 1281, 1027)
int(value[::-1].hex(), 16)
By example:
value = b'\xfd\xff\x00\x00\x00\x00\x00\x00'
print(int(value[::-1].hex(), 16))
65533
[::-1] invert the values (little endian), .hex() trabnsform to hex literal, int(,16) transform from hex literal to int base16.

Categories