I have problem with the unpack function in python - python

excuse me because my englisch is not good.
I am a tring to decode somme binary messages with unpack in python. but i have a problem
the first message look like this
from struct import *
firstMessage = b'\x00\x00\x00\x00\xff\xff\xff\x00' #without tags
decodeFirstMessage = unpack('1q',firstMessage)
print(decodeFirstMessage[0])
the second message look like this
from struct import *
secondMessage = b'*xxyyzz \x03 \x00\x00\x00\x00\xff\xff\xff\x00 tago1;' #with tags
decodeSecondMessage = unpack('7s1s1B1sq1s6s',firstMessage)
print(decodeSecondMessage [0])
for the first code i get :
72057589742960640
as answer.
for the second code i get:
unpack requires a buffer of 31 bytes
as answer.
i have tried to verify the value of format in the function unpack with this code
print(calcsize('1q'))
print(calcsize('7s1s1B1sq1s6s'))
i get:
8
and
31
I calculated the bytes myself and get
8
and
25
when i change q with b or h in "format" i get the correct value of 18 Bytes or 19 bytes with calcsize()
but for l and q i have problem
what ist wrong in my function or how can is solve this please ?

The reason for this is padding.
Read the whole doc section Byte Order, Size, and Alignment
An example:
>>> print(calcsize('1s1q'))
16
>>> print(calcsize('=1s1q'))
9

The short version is, use this for format instead:
"=7s1s1B1s1q1s6s"
The longer version is, alignment. When using default # meaning "native" for "Byte order", "Size", and "Alignment". The format is interpreted to match what layout of corresponding C struct on the platform would be. Using = the format switches to standard sizes and turns of alignment.

Related

Shared memory between C and python

I want to share memory between a program in C and another in python.
The c program uses the following structure to define the data.
struct Memory_LaserFrontal {
char Data[372]; // original data
float Med[181]; // Measurements in [m]
charD; // 'I': Invalid -- 'V': Valid
charS; // 'L': Clean -- 'S': Dirty
char LaserStatus[2];
};
From python I have managed to read the variable in memory using sysv_ipc but they have no structure and is seen as a data array. How can I restructure them?
python code:
from time import sleep
import sysv_ipc
# Create shared memory object
memory = sysv_ipc.SharedMemory(1234)
# Read value from shared memory
memory_value = memory.read()
print (memory_value)
print (len(memory_value))
while True:
memory_value = memory.read()
print (float(memory_value[800]))
sleep(0.1)
I have captured and printed the data in python, I have modified the sensor reading and the read data is also modified, confirming that the read data corresponds to the data in the sensor's shared memory. But without the proper structure y cant use the data.
You need to unpack your binary data structure into Python types. The Python modules struct and array can do this for you.
import struct
import array
NB: Some C compilers, but not the comomn ones, may pad your member variables to align each of them with the expected width for your CPU ( almost always 4 bytes ). This means that it may add padding bytes. You may have to experiment with the struct format parameter 'x' between the appropriate parts of your struct if this is the case. Python's struct module does not expect aligned or padded types by default, you need to inform it. See my note at the very end for a guess on what the padding might look like. Again, per #Max's comment, this is unlikely.
NB: I think the members charD and charS are really char D; and char S;
Assuming you want the floats as a Python list or equivalent we have to do some more work with the Python array module . Same for the char[] Data.
# Get the initial char array - you can turn it into a string if you need to later.
my_chars = array.array("b") # f for float, b for byteetc.
my_chars.from_bytes(memory_value[:372]) # happens that 372 C chars is 372 bytes.
Data = my_chars.tolist() # Could be bytes list
# advance to the member after Data
end_of_Data = struct.calcsize("372c")
# get the length in bytes that 181 floats take up
end_of_Med = struct.calcsize("181f") + end_of_Data
# now we know where the floats are
floats_as_bytes = memory_value[ end_of_Data : end_of_Med ]
# unpack the remaining parts
( D, S, LaserStatus_1, LaserStatus_2 ) = struct.unpack( "cccc", memory_value[end_of_Med:] )
Now use the array module to unpack to make a Python list
my_floats = array.array("f") # f for float, c for char etc.
my_floats.from_bytes(floats_as_bytes)
Now Data might be a list of Python bytes type that you need to convert to your preferred string encoding. Usually .decode('utf-8') is good enough.
Data_S = "".join(Data).decode('utf-8') # get a usable string in Data_S
Padding
struct Memory_LaserFrontal {
char Data[372]; // 372 is a multiple of 4, probably no padding
float Med[181]; // floats are 4 bytes, probably no padding
charD; // single char, expect 3 padding bytes after
charS; // single char, expect 3 padding bytes after
char LaserStatus[2]; // double char expect 2 padding bytes after.
};
So the last Python line above might be - where the 'x' indicates a padding byte that can be ignored.
( D, S, LaserStatus_1, LaserStatus_2 ) = struct.unpack( "cxxxcxxxccxx", memory_value[end_of_Med:] )
I always like to leave the full source code of the problem solved so others can use it if they have a similar problem.
thanks a lot all!
from time import sleep
import sysv_ipc
import struct
import array
# Create shared memory object
while True:
memory = sysv_ipc.SharedMemory(1234)
# Read value from shared memory
memory_value = memory.read()
#print (memory_value)
#print (len(memory_value))
# Get the initial char array - you can turn it into a string if you need to later.
my_chars = array.array("b") # f for float, c for char etc.
#my_chars.from_bytes(memory_value[:372]) # happens that 372 chars is 372 bytes.
Data = my_chars.tolist() # Could be bytes list
# advance to the member after Data
end_of_Data = struct.calcsize("372c")
# get the length in bytes that 181 floats take up
end_of_Med = struct.calcsize("181f") + end_of_Data
# now we know where the floats are
floats_as_bytes = memory_value[ end_of_Data : end_of_Med ]
# unpack the remaining parts
( D, S, LaserStatus_1, LaserStatus_2 ) = struct.unpack( "cccc", memory_value[end_of_Med:] )
print(len(floats_as_bytes)/4)
a=[]
for i in range(0,len(floats_as_bytes),4):
a.append(struct.unpack('<f', floats_as_bytes[i:i+4]))
print (a[0])
sleep(0.1)

How can I find the big endian key in a message?

I am trying to read a binary message from an ESP32 using a broker; i wrote a phyton script where I subscribe the topic. the message that i actually receive is:
b'\x00\x00\x00?'
this is a float binary little endian message but I don't the key to decode it. Is there a way to find the decode key based on this data?
This is my python code:
import paho.mqtt.client as mqtt
def on_connect1(client1, userdata1, flags1, rc1):
client1.subscribe("ESP32DevKit123/mytopic")
def on_message1(client1, userdata1, msg1):
print(msg1.topic+" "+ "TESTENZA: "+str(msg1.payload))
client1 = mqtt.Client()
client1.username_pw_set(username="myuser",password="mypassword")
client1.on_connect = on_connect1
client1.on_message = on_message1
client1.connect("linkclient", portnumber, 60)
def twosComplement_hex(hexval):
bits = 16 # Number of bits in a hexadecimal number format
on_message1 = int(hexval, bits)
if on_message1 & (1 << (bits-1)):
on_message1 -= 1 << bits
return on_message1
client1.loop_forever()
It also gives me an error in the line on_message1 -= 1 << bits; the error says: Expected intended block pylance. Any solutions?
The data you provided is b'\x00\x00\x00?' - I'm going to assume that this is 0000003f (please output hex with msg1.payload.hex()).
I'll also assume that by "float binary little endian" you mean a big endian floating point (IEE754) - note that this does not match up with the algorithm you are using (twos compliment). Plugging this input into an online tool indicates that the expected result ("Float - Big Endian (ABCD)") is 8.82818e-44 (it's worth checking with this tool; sometimes the encoding may not be what you think it is!).
Lets unpack this using python (see the struct docs for more information):
>>> from struct import unpack
>>> unpack('>f', b'\x00\x00\x00\x3f')[0]
8.828180325246348e-44
Notes:
The [0] is there because unpack returns an array (you can unpack more than one item from the input)
>f - the > means big-endian and the f float (standard size = 4 bytes)
The reason your original code gives the error "Expected intended block" is due to the lack of indentation in the line on_message1 -= 1 << bits (as it follows an if it needs to be indented). The algorithm does not appear relevant to your task (but there may be details I'm missing).

Decode emoji into two (or more) code points, using standard libraries

I'd like to be able to decode an emoji into its corresponding code points as seen here. I'm limited to using standard libraries in 2.7.
For example:
🇲🇩 -> U+1F1F2 U+1F1E9
I've managed to get the first code point using this code, but I can't figure out how to pull the second. Some emoji have even more code points.
to_decode = u'🇲🇩'
code = ord(to_decode[0])
if 0xd800 <= code <= 0xdbff:
code = (code - 0xd800) * 1024 + (ord(to_decode[1]) - 0xdc00) + + 0x010000
print(hex(code))
A combination of encode and struct.unpack can give you what you need.
>>> import struct
>>> b = to_decode.encode('utf_32_le')
>>> count = len(b) // 4
>>> count
2
>>> cp = struct.unpack('<%dI' % count, b)
>>> [hex(x) for x in cp]
['0x1f1f2', '0x1f1e9']
This is sort of an hack, but you can use the repr of the unicode string:
>>> repr(to_decode)
"u'\\U0001f1f2\\U0001f1e9'"
so:
>>> hex(int(repr(to_decode)[4:12], 16))
'0x1f1f2'
and
>>> hex(int(repr(to_decode)[14:22], 16))
'0x1f1e9'
You must extend this method to support emojis with more than two code points. You may consider using a combination of the above with .split("\\U").
For this problem, you actually need list() which will break a Unicode character into its constituent code points
to_decode = u'🇲🇩'
list(to_decode)
['🇲', '🇩']
As an example of what you can do with this, I created a unicode visualization of the Bengali Alphabet
https://www.kaggle.com/jamesmcguigan/unicode-visualization-of-the-bengali-alphabet

Decoding Base64 string

I'm working with Python to do some string decoding and I am trying to understand what does this line of code...
for irradiance_data in struct.iter_unpack("qHHHHfff", irradiance_list_bytes):
print(irradiance_data)
In my case irradiance_list_bytes is something like this
"\xf5R\x960\x00\x00\x00\x009\x0f\xb4\x03\x01\x00d\x00\xa7D\xd1BC\x8c\x9d\xc2\xb3\xa5\xf0\xc0\xaer\x990\x00\x00\x00\x000\x0f\xb2\x03\x01\x00d\x00\x8f+\xd1B\x81\x9c\x9d\xc2\xf7\xfb\xe6\xc0u\x96\x9c0\x00\x00\x00\x00.\x0f\xb1\x03\x01\x00d\x00\xfe\x81\xd3B\x8a\r\x9e\xc2\xb4\xe7\x01\xc1\x1a\x7f\x9f0\x00\x00\x00\x00*\x0f\xb0\x03\x01\x00d\x00Z\xf5\xd3B\xedq\x9e\xc2&\xa1\x03\xc1\x94\x82\xa20\x00\x00\x00\x00-\x0f\xb1\x03\x01\x00d\x00\xb6\x8f\xd3Bg\xdf\x9d\xc2\x00\xad\xfd\xc0#\x93\xa50\x00\x00\x00\x000\x0f\xb2\x03\x01\x00d\x00\x95n\xd4B\x1d'\x9e\xc2\x1dW\x01\xc1\xd3\xa1\xa80\x00\x00\x00\x001\x0f\xb2\x03\x01\x00d\x00\x1d\xbc\xd3B\xeb\xca\x9d\xc2s\xbf\xf2\xc0.\xaf\xab0\x00\x00\x00\x001\x0f\xb2\x03\x01\x00d\x00\x13\xad\xd4BJx\x9d\xc2G(\xfb\xc0.\xc2\xae0\x00\x00\x00\x007\x0f\xb4\x03\x01\x00d\x00\xd1\xc9\xd4BS\xb8\x9d\xc2\xf0\xd9\xf8\xc0"
And the message error is
AttributeError: 'module' object has no attribute 'iter_unpack'
I beleive that, I have to change "qHHHHfff" to another string format, but I don't understand ?
The complete code is here...
import os
import glob
import exiftool
import base64
import struct
irradiance_list_tag = 'XMP:IrradianceList'
irradiance_calibration_measurement_golden_tag = 'XMP:IrradianceCalibrationMeasurementGolden'
irradiance_calibration_measurement_tag = 'XMP:IrradianceCalibrationMeasurement'
tags = [ irradiance_list_tag, irradiance_calibration_measurement_tag ]
directory = '/home/stagiaire/Bureau/AAAA/'
channels = [ 'RED' ]
index = 0
for channel in channels:
files = glob.glob(os.path.join(directory, '*' + channel + '*'))
with exiftool.ExifTool() as et:
metadata = et.get_tags_batch(tags, files)
for file_metadata in metadata:
irradiance_list = file_metadata[irradiance_list_tag]
irradiance_calibration_measurement = file_metadata[irradiance_calibration_measurement_tag]
irradiance_list_bytes = base64.b64decode(irradiance_list)
print(files[index])
index += 1
for irradiance_data in struct.iter_unpack("qHHHHfff", irradiance_list_bytes):
print(irradiance_data)
EDIT
So a stated by Strubbly, this is the solution for this question.
print struct.unpack("I",x[:4])
for i in range(8):
start = 4 + i*28
print struct.unpack("qHHHHfff",x[start:start+28])
struct.iter_unpack is only available in Python 3 and you are using Python 2.
There is no direct equivalent. struct.unpack will unpack one lump of 28 bytes (with that format string). struct.iter_unpack will unpack multiples of 28 bytes in Python 3.
If your data was suitable for struct.iter_unpack with that format code then you could do something like this:
for i in range(0,len(x),28):
print struct.unpack("qHHHHfff",x[i:i+28])
Unfortunately your sample data is not a multiple of 28 bytes long and so I would expect an error in Python 3 as well.
Without knowing about your data it is hard to correct your code but, at a guess, you data might have 4 bytes of some other data at the front. So that could be unpacked with something like this:
print struct.unpack("I",x[:4])
for i in range(8):
start = 4 + i*28
print struct.unpack("qHHHHfff",x[start:start+28])
In this example I have guessed that the first four bytes are an unsigned int but I have no way of knowing if that is correct. More information is needed.

reorder byte order in hex string (python)

I want to build a small formatter in python giving me back the numeric
values embedded in lines of hex strings.
It is a central part of my formatter and should be reasonable fast to
format more than 100 lines/sec (each line about ~100 chars).
The code below should give an example where I'm currently blocked.
'data_string_in_orig' shows the given input format. It has to be
byte swapped for each word. The swap from 'data_string_in_orig' to
'data_string_in_swapped' is needed. In the end I need the structure
access as shown. The expected result is within the comment.
Thanks in advance
Wolfgang R
#!/usr/bin/python
import binascii
import struct
## 'uint32 double'
data_string_in_orig = 'b62e000052e366667a66408d'
data_string_in_swapped = '2eb60000e3526666667a8d40'
print data_string_in_orig
packed_data = binascii.unhexlify(data_string_in_swapped)
s = struct.Struct('<Id')
unpacked_data = s.unpack_from(packed_data, 0)
print 'Unpacked Values:', unpacked_data
## Unpacked Values: (46638, 943.29999999943209)
exit(0)
array.arrays have a byteswap method:
import binascii
import struct
import array
x = binascii.unhexlify('b62e000052e366667a66408d')
y = array.array('h', x)
y.byteswap()
s = struct.Struct('<Id')
print(s.unpack_from(y))
# (46638, 943.2999999994321)
The h in array.array('h', x) was chosen because it tells array.array to regard the data in x as an array of 2-byte shorts. The important thing is that each item be regarded as being 2-bytes long. H, which signifies 2-byte unsigned short, works just as well.
This should do exactly what unutbu's version does, but might be slightly easier to follow for some...
from binascii import unhexlify
from struct import pack, unpack
orig = unhexlify('b62e000052e366667a66408d')
swapped = pack('<6h', *unpack('>6h', orig))
print unpack('<Id', swapped)
# (46638, 943.2999999994321)
Basically, unpack 6 shorts big-endian, repack as 6 shorts little-endian.
Again, same thing that unutbu's code does, and you should use his.
edit Just realized I get to use my favorite Python idiom for this... Don't do this either:
orig = 'b62e000052e366667a66408d'
swap =''.join(sum([(c,d,a,b) for a,b,c,d in zip(*[iter(orig)]*4)], ()))
# '2eb60000e3526666667a8d40'
The swap from 'data_string_in_orig' to 'data_string_in_swapped' may also be done with comprehensions without using any imports:
>>> d = 'b62e000052e366667a66408d'
>>> "".join([m[2:4]+m[0:2] for m in [d[i:i+4] for i in range(0,len(d),4)]])
'2eb60000e3526666667a8d40'
The comprehension works for swapping byte order in hex strings representing 16-bit words. Modifying it for a different word-length is trivial. We can make a general hex digit order swap function also:
def swap_order(d, wsz=4, gsz=2 ):
return "".join(["".join([m[i:i+gsz] for i in range(wsz-gsz,-gsz,-gsz)]) for m in [d[i:i+wsz] for i in range(0,len(d),wsz)]])
The input params are:
d : the input hex string
wsz: the word-size in nibbles (e.g for 16-bit words wsz=4, for 32-bit words wsz=8)
gsz: the number of nibbles which stay together (e.g for reordering bytes gsz=2, for reordering 16-bit words gsz = 4)
import binascii, tkinter, array
from tkinter import *
infile_read = filedialog.askopenfilename()
with open(infile, 'rb') as infile_:
infile_read = infile_.read()
x = (infile_read)
y = array.array('l', x)
y.byteswap()
swapped = (binascii.hexlify(y))
This is a 32 bit unsigned short swap i achieved with code very much the same as "unutbu's" answer just a little bit easier to understand. And technically binascii is not needed for the swap. Only array.byteswap is needed.

Categories