Python sending hexadecimal data, efficient storage

Python sending hexadecimal data, efficient storage - python

I'm currently working on a project that involves some very remote data gathering. At the end of each day, a very short summary is sent back to the server via a satellite connection.
Because sending data over the satellite is very expensive, I want the data to be as compact as possible. Additionally, the service I'm using only allows for sending data in the following two formats: ASCII, hexadecimal. Most of the data I will be sending consists of floats, where the precision should be as high as possible without taking up too much space.
Below I have a working version of what I'm currently using, but there should be a more efficient way to store the data. Any help would be much appreciated.
import ctypes
#------------------------------------------------------------------
#This part is known to the client as well as the server
class my_struct(ctypes.Structure):
_pack_ = 1
_fields_ = [('someText', ctypes.c_char * 12),
('underRange', ctypes.c_float),
('overRange', ctypes.c_float),
('TXPDO', ctypes.c_float)]
def print_struct(filled_struct):
d = filled_struct
for name,typ in d._fields_:
value = getattr(d, name)
print('{:20} {:20} {}'.format(name, str(value), str(typ)))
#------------------------------------------------------------------
# This part of the code is performed on the client side
#Filling the struct with some random data, real data will come from sensors
s = my_struct()
s.someText = 'Hello World!'.encode()
s.underRange = 4.01234
s.overRange = 4.012345
s.TXPDO = 1.23456789
#Rounding errors occurred when converting to ctypes.c_float
print('Data sent:')
print_struct(s)
data = bytes(s) #Total length is 24 bytes (12 for the string + 3x4 for the floats)
data_hex = data.hex() #Total length is 48 bytes in hexadecimal format
#Now the data is sent over a satellite connection, it should be as small as possible
print('\nLength of sent data: ',len(data_hex),'bytes\n')
#------------------------------------------------------------------
# This part of the code is performed on the server side
def move_bytes_to_struct(struct_to_fill,bytes_to_move):
adr = ctypes.addressof(struct_to_fill)
struct_size = ctypes.sizeof(struct_to_fill)
bytes_to_move = bytes_to_move[0:struct_size]
ctypes.memmove(adr, bytes_to_move, struct_size)
return struct_to_fill
#Data received can be assumed to be the same as data sent
data_hex_received = data_hex
data_bytes = bytes.fromhex(data_hex_received)
data_received = move_bytes_to_struct(my_struct(), data_bytes)
print('Data received:')
print_struct(data_received)

I don't know if you are overcomplicating things a bit.
The struct module will let you do pretty much what you are doing, but simpler:
struct.pack("fffs12", 4.01234, 4.012345, 1.23456789, 'Hello World!'.encode())
This of course depends on you knowing how long your string is, but you could also not care about that:
struct.pack("fff", 4.01234, 4.012345, 1.23456789) + 'Hello World!'.encode()
But, about saving things more efficient:
The more you know about your data, the more shortcuts you can take.
Is the string only ascii? You could squeeze each char into maybe 7 bits, or even 6.
That could bring your 12 bytes string to 9.
If you know the range of your floats, you could perhaps trim that as well.
If you can send larger batches, compression might help as well.

Related

Astropy time conversion very slow

I have a script that reads some data from a binary stream of "packets" containing "parameters". The parameters read are stored in a dictionary for each packet, which is appended to an array representing the packet stream.
At the end this array of dict is written to an output CSV file.
Among the read data is a CUC7 datetime, stored as coarse/fine integer parts of a GPS time, which I also want to convert to a UTC ISO string.
from astropy.time import Time
def cuc2gps_time(coarse, fine):
return Time(coarse + fine / (2**24), format='gps')
def gps2utc_time(gps):
return Time(gps, format='isot', scale='utc')
The issue is that I realized that these two time conversions make up 90% of the total run time of my script, most of the work of my script being done in the 10% remaining (read binary file, decode 15 other parameters, write to CSV).
I somehow improved the situation by making the conversions in batches on Numpy arrays, instead of on each packet. This reduces the total runtime by about half.
import numpy as np
while end_not_reached:
# Read 1 packet
# (...)
nb_packets += 1
end_not_reached = ... # boolean
# Process in batches for better performance
if not nb_packets%1000 or not end_not_reached:
# Convert CUC7 time to GPS and UTC times
all_coarse = np.array([packet['lobt_coarse'] for packet in packets])
all_fine = np.array([packet['lobt_fine'] for packet in packets])
all_gps = cuc2gps_time(all_coarse, all_fine)
all_utc = gps2utc_time(all_gps)
# Add times to each packet
for packet, gps_time, utc_time in zip(packets, all_gps, all_utc):
packet.update({'gps_time': gps_time, 'utc_time': utc_time})
But my script is still absurdly slow. Reading 60000 packets from a 1.2GB file and writing it as a CSV takes 12s, against only 2.5s if I remove the time conversion.
So:
Is it expected that Astropy time conversions are so slow? Am I using it wrong? Is there a better library?
Is there a way to improve my current implementation? I suspect that the remaining "for" loop in there is very costly, but could not find a good way to replace it.

I think the problem is that you're doing multiple loops over your sequence of packets. In Python I would recommend having arrays representing each parameter, instead of having a list of objects, each with a bunch of scalar parameters.
If you can read all the packets in at once, I would recommend something like:
num_bytes = ...
num_bytes_per_packet = ...
num_packets = num_bytes / num_bytes_per_packet
param1 = np.empty(num_packets)
param2 = np.empty(num_packets)
...
time_coarse = np.empty(num_packets)
time_fine = np.empty(num_packets)
...
param_N = np.empty(num_packets)
for i in range(num_packets):
param_1[i], param_2[i], ..., time_coarse[i], time_fine[i], ... param_N[i] = decode_packet(...)
time_gps = Time(time_coarse + time_fine / (2**24), format='gps')
time_utc = time_gps.utc.isot

What is the appropriate way to flatten or serialize data in Python so it only contains the data bytes byes?

I'm a heavy LabVIEW user who is just starting to learn Python. I work with industrial and aerospace equipment a lot and something I need to do very often is process some data, then export it over some communications protocol in binary. For example, let's say I have a packet that contains a struct/cluster/other-complex-data-element that has the following underlying data elements:
sync - unsigned 32-bit integer
time - 64-bit double
payload ID - 16-bit signed integer
source - 16-but signed integer
destination - 16-bit signed integer
payload length - 32 bit signed integer
data 1 - 8-bit unsigned integer
data 2 - 8-bit unsigned integer
data 3 - 8-bit unsigned integer
data 4 - 8-bit unsigned integer
data 4 - 64-bit double
data 5 - 32-bit single
data 6 - 16 bit unsigned integer
crc - 32-bit unsigned integer
(This frame should be 42 bytes long)
I call this a frame, where there is some header information, a payload, then a crc, I think that's a common term for what I'm creating. The data types, and their location in the byte stream is critical. Any extraneous or missing bytes breaks the data transfer protocol and the data cannot be tolerated.
My question is this:
How do you achieve this easily in Python? In LabVIEW (and probably other languages), there are good, built in functions and methods to clearly define the data types, then flatten them to a string of bytes that is very efficient. It seems that with picking, there are things going on that I don't understand.
In my example code, I have a simple function to get some memory information, then serialize it. I would expect the integer version to have 88 bytes and the float version to have 172 bytes, but I get 87 and 115 respectively. Here is the code, thanks for your help!
import psutil
import time
import pickle
def getMemoryInfo():
while True:
virtual_memory = psutil.virtual_memory()
swap_memory = psutil.swap_memory()
memoryInfo = list(virtual_memory+swap_memory)
# memoryInfo = [float(x) for x in memoryInfo]
time.sleep(1.000)
print(memoryInfo)
string = pickle.dumps(memoryInfo)
print(string)
print(len(memoryInfo))
print(len(string))
getMemoryInfo()

The struct module worked for me. It was a little more tedious than I would have hoped, but it worked just fine. Here is the code I ended up using:
def build_frame(payload, payload_class, payload_id, source, destination):
# form frame header and payload
frame = {"sync": int(0x64617665),
"absolute_time": time.time(),
"relative_time": time.monotonic(),
"source": int(source),
"destination": int(destination),
"counter": 0,
"payload_class": int(payload_class),
"payload_id": int(payload_id),
"payload_length": int(len(payload)),
"payload": payload}
# form bytearray to crc
sync = (struct.pack('<I', frame['sync']))
absolute_time = (struct.pack('<d', frame['absolute_time']))
relative_time = (struct.pack('<d', frame['relative_time']))
source = (struct.pack('i', frame['source']))
destination = (struct.pack('i', frame['destination']))
counter = (struct.pack('I', frame['counter']))
payload_class = (struct.pack('i', frame['payload_class']))
payload_id = (struct.pack('i', frame['payload_id']))
payload_length = (struct.pack('i', frame['payload_length']))
payload_bytes = frame['payload']
crc_bytes = sync + absolute_time + relative_time + source + destination + counter + payload_class + payload_id + payload_length + payload_bytes
# crc bytes and add to frame
frame['crc'] = binascii.crc32(crc_bytes)
return frame

Enconding temperature for exchange over bluetooth

I'm trying to understand how to encode some data for transfer over BLE (bluetooth low energy).
Specifically, I'm interested in this particular line:
https://github.com/micropython/micropython/blob/05f95682e7ddfb08c317e83826df9a1d636676f3/ports/nrf/examples/ubluepy_temp.py#L68
Which comes from the snippet:
temp = Temp.read()
temp = temp * 100
char_temp.write(bytearray([temp & 0xFF, temp >> 8]))
Before we even come to the why part, I need to understand the how. In this snippet, the temperature is read from a sensor as a float, in Celsius. Let's say "20.00" for now. We then multiply it by 100, and then comes the encoding part:
2000 & 0xFF -> 208
2000 >> 8 -> 7
So we're basically sending:
>>> bytearray([208, 7])
bytearray(b'\xd0\x07')
Is this correct? I would say so, I checked it with my own device and this seems to be the data that is being sent, and it also works, I can read the temperature sent from my BLE device.
What I don't understand is why all these bit manipulations are required. For example, I tried to just send bytearray([hex(20)]) but it doesn't work (when trying to read the temperature from my phone, the data couldn't be parsed/converted).
Could you explain the format of the data sent please?

As shown in the tests, the straight forward way to do the conversion in Python is with to_bytes and from_bytes functionality
Using your data it would be for the bytes to an int direction:
>>> int.from_bytes([208, 7], byteorder='little', signed=True)
2000
From int to bytes:
>>> int(2000).to_bytes(2, byteorder='little', signed=True)
b'\xd0\x07'
Or from the reading to bytes:
>>> int(20.00*100).to_bytes(2, byteorder='little', signed=True)
b'\xd0\x07'
Bluetooth data should be a list or bytearray in little endian format. Each integer in the list needs to represent an octet.
From the source you linked to I can see the characteristic UUID is 0x2A6E:
uuid_temp = UUID("0x2A6E") # Temperature characteristic
This is an official UUID so is described in "16-bit UUID Numbers Document" at https://www.bluetooth.com/specifications/assigned-numbers/
There is more detailed explanation in the GATT Specification Supplement document on the Bluetooth SIG website in section: "2: Values and represented values". In that document it also explains how to represent Temperature this way:

unable to unpack information between custom Preamble in Python and telnetlib

I have an industrial sensor which provides me information via telnet over port 10001.
It has a Data Format as follows:
Also the manual:
All the measuring values are transmitted int32 or uint32 or float depending on the sensors
Code
import telnetlib
import struct
import time
# IP Address, Port, timeout for Telnet
tn = telnetlib.Telnet("169.254.168.150", 10001, 10)
while True:
op = tn.read_eager() # currently read information limit this till preamble
print(op[::-1]) # make little-endian
if not len(op[::-1]) == 0: # initially an empty bit starts (b'')
data = struct.unpack('!4c', op[::-1]) # unpacking `MEAS`
time.sleep(0.1)
my initial attempt:
Connect to the sensor
read data
make it to little-endian
OUTPUT
b''
b'MEAS\x85\x8c\x8c\x07\xa7\x9d\x01\x0c\x15\x04\xf6MEAS'
b'\x04\xf6MEAS\x86\x8c\x8c\x07\xa7\x9e\x01\x0c\x15\x04\xf6'
b'\x15\x04\xf6MEAS\x85\x8c\x8c\x07\xa7\x9f\x01\x0c\x15'
b'\x15\x04\xf6MEAS\x87\x8c\x8c\x07\xa7\xa0\x01\x0c'
b'\xa7\xa2\x01\x0c\x15\x04\xf6MEAS\x87\x8c\x8c\x07\xa7\xa1\x01\x0c'
b'\x8c\x07\xa7\xa3\x01\x0c\x15\x04\xf6MEAS\x87\x8c\x8c\x07'
b'\x88\x8c\x8c\x07\xa7\xa4\x01\x0c\x15\x04\xf6MEAS\x88\x8c'
b'MEAS\x8b\x8c\x8c\x07\xa7\xa5\x01\x0c\x15\x04\xf6MEAS'
b'\x04\xf6MEAS\x8b\x8c\x8c\x07\xa7\xa6\x01\x0c\x15\x04\xf6'
b'\x15\x04\xf6MEAS\x8a\x8c\x8c\x07\xa7\xa7\x01\x0c\x15'
b'\x15\x04\xf6MEAS\x88\x8c\x8c\x07\xa7\xa8\x01\x0c'
b'\x01\x0c\x15\x04\xf6MEAS\x88\x8c\x8c\x07\xa7\xa9\x01\x0c'
b'\x8c\x07\xa7\xab\x01\x0c\x15\x04\xf6MEAS\x8b\x8c\x8c\x07\xa7\xaa'
b'\x8c\x8c\x07\xa7\xac\x01\x0c\x15\x04\xf6MEAS\x8c\x8c'
b'AS\x89\x8c\x8c\x07\xa7\xad\x01\x0c\x15\x04\xf6MEAS\x8a'
b'MEAS\x88\x8c\x8c\x07\xa7\xae\x01\x0c\x15\x04\xf6ME'
b'\x15\x04\xf6MEAS\x87\x8c\x8c\x07\xa7\xaf\x01\x0c\x15\x04\xf6'
b'\x15\x04\xf6MEAS\x8a\x8c\x8c\x07\xa7\xb0\x01\x0c'
b'\x0c\x15\x04\xf6MEAS\x8a\x8c\x8c\x07\xa7\xb1\x01\x0c'
b'\x07\xa7\xb3\x01\x0c\x15\x04\xf6MEAS\x89\x8c\x8c\x07\xa7\xb2\x01'
b'\x8c\x8c\x07\xa7\xb4\x01\x0c\x15\x04\xf6MEAS\x89\x8c\x8c'
b'\x85\x8c\x8c\x07\xa7\xb5\x01\x0c\x15\x04\xf6MEAS\x84'
b'MEAS\x87\x8c\x8c\x07\xa7\xb6\x01\x0c\x15\x04\xf6MEAS'
b'\x04\xf6MEAS\x8b\x8c\x8c\x07\xa7\xb7\x01\x0c\x15\x04\xf6'
b'\x15\x04\xf6MEAS\x8b\x8c\x8c\x07\xa7\xb8\x01\x0c\x15'
b'\x15\x04\xf6MEAS\x8a\x8c\x8c\x07\xa7\xb9\x01\x0c'
b'\xa7\xbb\x01\x0c\x15\x04\xf6MEAS\x87\x8c\x8c\x07\xa7\xba\x01\x0c'
try to unpack the preamble !?
How do I read information like Article number, Serial number, Channel, Status, Measuring Value between the preamble?
The payload size seems to be fixed here for 22 Bytes (via Wireshark)

Parsing the reversed buffer is just weird; please use struct's support for endianess. Using big-endian '!' in a little-endian context is also odd.
The first four bytes are a text constant. Ok, fine perhaps you'll need to reverse those. But just those, please.
After that, use struct.unpack to parse out 'IIQI'. So far, that was kind of working OK with your approach, since all fields consume 4 bytes or a pair of 4 bytes. But finding frame M's length is the fly in the ointment since it is just 2 bytes, so parse it with 'H', giving you a combined 'IIQIH'. After that, you'll need to advance by only that many bytes, and then expect another 'MEAS' text constant once you've exhausted that set of measurements.

I managed to avoid TelnetLib altogether and created a tcp client using python3. I had the payload size already from my wireshark dump (22 Bytes) hence I keep receiving 22 bytes of Information. Apparently the module sends two distinct 22 Bytes payload
First (frame) payload has the preamble, serial, article, channel information
Second (frame) payload has the information like bytes per frame, measuring value counter, measuring value Channel 1, measuring value Channel 2, measuring value Channel 3
The information is in int32 and thus needs a formula to be converted to real readings (mentioned in the instruction manual)
(as mentioned by #J_H the unpacking was as He mentioned in his answer with small changes)
Code
import socket
import time
import struct
DRANGEMIN = 3261
DRANGEMAX = 15853
MEASRANGE = 50
OFFSET = 35
# Create a TCP/IP socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_address = ('169.254.168.150', 10001)
print('connecting to %s port %s' % server_address)
sock.connect(server_address)
def value_mm(raw_val):
return (((raw_val - DRANGEMIN) * MEASRANGE) / (DRANGEMAX - DRANGEMIN) + OFFSET)
if __name__ == '__main__':
while True:
Laser_Value = 0
data = sock.recv(22)
preamble, article, serial, x1, x2 = struct.unpack('<4sIIQH', data)
if not preamble == b'SAEM':
status, bpf, mValCounter, CH1, CH2, CH3 = struct.unpack('<hIIIII',data)
#print(CH1, CH2, CH3)
Laser_Value = CH3
print(str(value_mm(Laser_Value)) + " mm")
#print('RAW: ' + str(len(data)))
print('\n')
#time.sleep(0.1)
Sure enough, this provides me the information that is needed and I compared the information via the propreitary software which the company provides.

Python struct as networking data packets (uknown byte sequence)

I am working on a server engine in Python, for my game made in GameMaker Studio 2. I'm currently having some issues with making and sending a packet.
I've successfully managed to establish a connection and send the first packet, but I can't find a solution for sending data in a sequence of which if the first byte in the packed struct is equal to a value, then unpack other data into a given sequence.
Example:
types = 'hhh' #(message_id, x, y) example
message_id = 0
x = 200
y = 200
buffer = pack(types, 0,x, y)
On the server side:
data = conn.recv(BUFFER_SIZE)
mid = unpack('h', data)[0]
if not data: break
if mid == 0:
sequnce = 'hhh'
x = unpack(sequnce, data)[1]
y = unpack(sequnce, data)[2]

It looks like your subsequent decoding is going to vary based on the message ID?
If so, you will likely want to use unpack_from which allows you to pull only the first member from the data (as written now, your initial unpack call will generate an exception because the buffer you're handing it is not the right size). You can then have code that varies the unpacking format string based on the message ID. That code could look something like this:
from struct import pack, unpack, unpack_from
while True:
data = conn.recv(BUFFER_SIZE)
# End of file, bail out of loop
if not data: break
mid = unpack_from('!h', data)[0]
if mid == 0:
# Message ID 0
types = '!hhh'
_, x, y = unpack(types, data)
# Process message type 0
...
elif mid == 1:
types = '!hIIq'
_, v, w, z = unpack(types, data)
# Process message type 1
...
elif mid == 2:
...
Note that we're unpacking the message ID again in each case along with the ID-specific parameters. You could avoid that if you like by using the optional offset argument to unpack_from:
x, y = unpack_from('!hh', data, offset=2)
One other note of explanation: If you are sending messages between two different machines, you should consider the "endianness" (byte ordering). Not all machines are "little-endian" like x86. Accordingly it's conventional to send integers and other structured numerics in a certain defined byte order - traditionally that has been "network byte order" (which is big-endian) but either is okay as long as you're consistent. You can easily do that by prepending each format string with '!' or '<' as shown above (you'll need to do that for every format string on both sides).
Finally, the above code probably works fine for a simple "toy" application but as your program increases in scope and complexity, you should be aware that there is no guarantee that your single recv call actually receives all the bytes that were sent and no other bytes (such as bytes from a subsequently sent buffer). In other words, it's often necessary to add a buffering layer, or otherwise ensure that you have received and are operating on exactly the number of bytes you intended.

Could you unpack whole data to list, and then check its elements in the loop? What is the reason to unpack it 3 times? I guess, you could unpack it once, and then work with that list - check its length first, if not empty -> check first element -> if equal to special one, continue on list parsing. Did you try like that?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python sending hexadecimal data, efficient storage - python

Related

Astropy time conversion very slow

What is the appropriate way to flatten or serialize data in Python so it only contains the data bytes byes?

Enconding temperature for exchange over bluetooth

unable to unpack information between custom Preamble in Python and telnetlib

Python struct as networking data packets (uknown byte sequence)

Categories

Resources