Converting binary timestamp to string - python

I'm trying to parse a proprietary binary-format (Wintec NAL) with python. There's existing and working C-code that does the same (Author: Dennis Heynlein), which i'm trying to port to Python.
I'm struggling to understand parts of the C-code. Here's the definition of the binary format in C:
/*
* File extension:. NAL
* File format: binary, 32 byte fixed block length
*/
/*
* For now we will read raw structs direct from the data file, ignoring byte
* order issues (since the data is in little-endian form compatible with i386)
*
* XXX TODO: write marshalling functions to read records in the proper
* byte-order agnostic way.
*/
#pragma pack (1)
typedef struct nal_data32 {
unsigned char point_type; /* 0 - normal, 1 - start, 2 - marked */
unsigned char padding_1;
unsigned int second: 6, minute: 6, hour: 5;
unsigned int day: 5, month: 4, year: 6; /* add 2000 to year */
signed int latitude; /* divide by 1E7 for degrees */
signed int longitude; /* divide by 1E7 for degrees */
unsigned short height; /* meters */
signed char temperature; /* °C */
unsigned short pressure; /* mbar */
unsigned char cadence; /* RPM */
unsigned char pulse; /* BPM */
signed char slope; /* degrees */
signed short compass; /* °Z axis */
signed short roll; /* °X axis */
signed short yaw; /* °Y axis */
unsigned char speed; /* km/h */
unsigned char bike; /* ID# 0-3 */
unsigned char padding_2;
unsigned char padding_3;
} nal_t;
I'm using python-bitstring to replicate this functionality in Python, but i have difficulties in understanding the time-format given above and adopting it to Python.
from bitstring import ConstBitStream
nal_format=('''
uint:8,
uint:8,
bin:32,
intle:32,
intle:32,
uint:16,
uint:8,
uint:16,
uint:8,
uint:8,
uint:8,
uint:16,
uint:16,
uint:16,
uint:8,
uint:8,
uint:8,
uint:8
''')
f = ConstBitStream('0x01009f5a06379ae1cb13f7a6b62bca010dc703000000c300fefff9ff00000000')
f.pos=0
#type,padding1,second,minute,hour,day,month,year,lat,lon,height,temp,press,cad,pulse,slope,compass,roll,yaw,speed,bike,padding2,padding3=f.peeklist(nal_format)
type,padding1,time,lat,lon,height,temp,press,cad,pulse,slope,compass,roll,yaw,speed,bike,padding2,padding3=f.readlist(nal_format)
print type
print padding1
#print second
#print minute
#print hour
#print day
#print month
#print year
print time
print lat
print lon
While i've figured out that latitude and longitude has to be defined as little-endian, i have no idea how to adapt the 32bit wide timestamp so it fits the format given in the C-definition (And i also couldn't figure out a matching mask for "height" - correspondingly i didn't try the fields after it).
These are the values for the hex-string above:
date: 2013/12/03-T05:42:31
position: 73.3390583° E, 33.2128666° N
compass: 195°, roll -2°, yaw -7°
alt: 458 meters
temp: 13 °C
pres: 967 mb

I'm not familiar with bitstring, so I'll convert your input into packed binary data and then use struct to handle it. Skip to the break if you're uninterested in that part.
import binascii
packed = binascii.unhexlify('01009f5a06379ae1cb13f7a6b62bca010dc703000000c300fefff9ff00000000')
I can go over this part in more detail if you want. It's just turning '0100...' into b'\x01\x00...'.
Now, the only "gotcha" in unpacking this is figuring out that you only want to unpack ONE unsigned int, since that bit field fits into 32 bits (the width of a single unsigned int):
format = '<ccIiiHbHBBbhhhBBBB'
import struct
struct.unpack(format,packed)
Out[49]:
('\x01',
'\x00',
923163295,
...
)
That converts the output into an output we can use. You can unpack that into your long list of variables, like you were doing before.
Now, your question seemed to be centered around how to mask time (above: 923163295) to get the proper values out of the bit field. That's just a little bit of math:
second_mask = 2**6 - 1
minute_mask = second_mask << 6
hour_mask = (2**5 - 1) << (6+6)
day_mask = hour_mask << 5
month_mask = (2**4 - 1) << (6+6+5+5)
year_mask = (2**6 - 1) << (6+6+5+5+4)
time & second_mask
Out[59]: 31
(time & minute_mask) >> 6
Out[63]: 42
(time & hour_mask) >> (6+6)
Out[64]: 5
(time & day_mask) >> (6+6+5)
Out[65]: 3
(time & month_mask) >> (6+6+5+5)
Out[66]: 12
(time & year_mask) >> (6+6+5+5+4)
Out[67]: 13L
In function form, the whole thing is a bit more natural:
def unmask(num, width, offset):
return (num & (2**width - 1) << offset) >> offset
Which (now that I think about it) rearranges into:
def unmask(num, width, offset):
return (num >> offset) & (2**width - 1)
unmask(time, 6, 0)
Out[77]: 31
unmask(time, 6, 6)
Out[78]: 42
#etc
And if you want to get fancy,
from itertools import starmap
from functools import partial
width_offsets = [(6,0),(6,6),(5,12),(5,17),(4,22),(6,26)]
list(starmap(partial(unmask,time), width_offsets))
Out[166]: [31, 42, 5, 3, 12, 13L]
Format all those numbers correctly and finally out comes the expected date/time:
'20{:02d}/{:02d}/{:02d}-T{:02d}:{:02d}:{:02d}'.format(*reversed(_))
Out[167]: '2013/12/03-T05:42:31'
(There is likely a way to do all of this bitwise math elegantly with that bitstring module, but I just find it satisfying to solve things from first principles.)

The time stamp in the 'C' structure is a 'C' bitfield. The compiler uses the number after the colon to allocate a number of bits within the larger field definition. In this case, an unsigned int (4 bytes). Look here for a better explanation. The big gotcha, for bit fields, is that the bits are assigned based on the endian type of the computer so they aren't very portable.
There appears to be an error in your Python format declaration. It probably should have an additional 4 byte unsigned int allocated for the date. Something like:
nal_format=('''
uint:8,
uint:8,
bin:32,
bin:32,
intle:32,
intle:32,
''')
To represent the bit field in Python, use a Python Bit Array to represent the bits. Check out this.
One other thing to be aware of, the pack(1) on the structure. It tells the compiler to align on one byte boundaries. In other words, don't add any padding between fields. typically the alignment is 4 bytes causing the compiler to start each field on a 4 byte boundary. Check here for more information.

Related

How to convert 4 (signed int) to int?

I am working with byte (receiving data from IOT devices)
There are a few terms that I dont understand.
For example:
If the document mentions the data size of 2 (signed int). Then for the next 2 value I should do:
((byteArray[0] << 8) + byteArray[1])
I actually dont get why we should do it. Anyway I need to know the resolve for :
4 signed int
2 (signed int MSB) + 1 (unsigned int, decimal part)
==========================================
For example:
If the list's first value is 0x01 -> the next 2 value is the data we want but it is 2 (signed int). My code handle for that is :
data = bytearray.fromhex(input)
#data size of 2 (signed int)
if data[0].to_bytes(1,'big') == b'\x01':
wanttedData = ((input[1] << 8) + input[2])
#data size of 4 (signed int)
The struct package is a good way to convert byte data into various types. You need to know the endianness of the data. From your example the data appears to be big endian.
For example, if the data is:
byteArray[0] is an 8 bit signed integer
byteArray[1:2] is a 16 bit signed integer
byteArray[3:6] is a 32 bit unsigned integer
then you can decode the data using a format of ">bhI" (the > indicates big-endian, and each letter corresponds to each data type), and you can extract the three values with:
import struct
byte_string = b'\x02\x03\x05\x12\x34\x56\xff'
val0, val1, val2 = struct.Struct(">bhI").unpack_from(byte_string)
print(hex(val0), hex(val1), hex(val2)) # prints 0x2 0x305 0x123456ff

Reading type "struct tm" from binary file using Python

I am trying to read the time at which a binary file was saved from information in the binary file. This information is stored as a type "struct tm" of size eighteen bytes --
struct tm
{
int tm_sec; // seconds [0,61]
int tm_min; // minutes [0,59]
int tm_hour; // hour [0,23]
int tm_mday; // day of month [1,31]
int tm_mon; // month of year [0,11]
int tm_year; // years since 1900
int tm_wday; // day of week [0,6] (Sunday = 0)
int tm_yday; // day of year [0,365]
int tm_isdst; // daylight savings flag
};
I have tried using the struct module for Python. I'm also reading the binary file for other values using this module, and this has worked well, but for discrete float values of four bytes each. I have used the following code to try and read the integer values in struct tm
time_tgt = struct.unpack('9h', binconts[160:(160 + 18)])[0]
Bin counts refer here to the offset (160), and the range, which is 160+(size of the data type, which here is 18).
My thought process here is that, at 18 bytes, each of the nine components of struct tm consists of 2 bytes, so a short integer using h as the argument for struct.unpack. However, I've had no luck in returning a series of value, which is what I would expect.
Is struct the appropriate tool to use? Is struct tm here a string of values which individually is composed of int values but which are amalgameted into one type?

How do I create a Python bytes object in the C API

I have a Numpy vector of bools and I'm trying to use the C API to get a bytes object as quickly as possible from it. (Ideally, I want to map the binary value of the vector to the bytes object.)
I can read in the vector successfully and I have the data in bool_vec_arr. I thought of creating an int and setting its bits in this way:
PyBytesObject * pbo;
int byte = 0;
int i = 0;
while ( i < vec->dimensions[0] )
{
if ( bool_vec_arr[i] )
{
byte |= 1UL << i % 8;
}
i++;
if (i % 8 == 0)
{
/* do something here? */
byte = 0;
}
}
return PyBuildValue("S", pbo);
But I'm not sure how to use the value of byte in pbo. Does anyone have any suggestions?
You need to store the byte you've just completed off. Your problem is you haven't made an actual bytes object to populate, so do that. You know how long the result must be (one-eighth the size of the bool vector, rounded up), so use PyBytes_FromStringAndSize to get a bytes object of the correct size, then populate it as you go.
You'd just allocate with:
// Preallocate enough bytes
PyBytesObject *pbo = PyBytes_FromStringAndSize(NULL, (vec->dimensions[0] + 7) / 8);
// Put check for NULL here
// Extract pointer to underlying buffer
char *bytebuffer = PyBytes_AsString(pbo);
where adding 7 then dividing by 8 rounds up to ensure you have enough bytes for all the bits, then assign to the appropriate index when you've finished a byte, e.g.:
if (i % 8 == 0)
{
bytebuffer[i / 8 - 1] = byte; // Store completed byte to next index
byte = 0;
}
If the final byte might be incomplete, you'll need to decide how to handle this (do the pad bits appear on the left or right, is the final byte omitted and therefore you shouldn't round up the allocation, etc.).

char array to unsigned char python

I'm trying to translate this c code into python, but Im having problems with the char* to ushort* conversion:
void sendAsciiCommand(string command) {
unsigned int nchars = command.length() + 1; // Char count of command string
unsigned int nshorts = ceil(nchars / 2); // Number of shorts to store the string
std::vector<unsigned short> regs(nshorts); // Vector of short registers
// Transform char array to short array with endianness conversion
unsigned short *ascii_short_ptr = (unsigned short *)(command.c_str());
for (unsigned int i = 0; i < nshorts; i++)
regs[i] = htons(ascii_short_ptr[i]);
return std::string((char *)regs.data());
}
As long I have tried with this code in Python 2.7:
from math import ceil
from array import array
command = "hello"
nchars = len(command) + 1
nshorts = ceil(nchars/2)
regs = array("H", command)
But it gives me the error:
ValueError: string length not a multiple of item size
Any help?
The exception text:
ValueError: string length not a multiple of item size
means what is says, i.e., the length of the string from which you are trying to create an array must be a multiple of the item size. In this case the item size is that of an unsigned short, which is 2 bytes. Therefore the length of the string must be a multiple of 2. hello has length 5 which is not a multiple of 2, so you can't create an array of 2 byte integers from it. It will work if the string is 6 bytes long, e.g. hello!.
>>> array("H", 'hello!')
array('H', [25960, 27756, 8559])
You might still need to convert to network byte order. array uses the native byte order on your machine, so if your native byte order is little endian you will need to convert it to big endian (network byte order). Use sys.byteorder to check and array.byteswap() to swap the byte order if required:
import sys
from array import array
s = 'hello!'
regs = array('H', s)
print(regs)
# array('H', [25960, 27756, 8559])
if sys.byteorder != 'big':
regs.byteswap()
print(regs)
# array('H', [26725, 27756, 28449])
However, it's easier to use struct.unpack() to convert straight to network byte order if necessary:
import struct
s = 'hello!'
n = len(s)/struct.calcsize('H')
regs = struct.unpack('!{}H'.format(n), s)
print(regs)
#(26725, 27756, 28449)
If you really need an array:
regs = array('H', struct.unpack('!{}H'.format(n), s))
It's also worth pointing out that your C++ code contains an error. If the string length is odd an extra byte will be read at the end of the string and this will be included in the converted data. That extra byte will be \0 as the C string should be null terminated, but the last unsigned short should either be ignored, or you should check that the length of the string is multiple of an unsigned short, just as Python does.

struct.error: unpack requires a string argument of length 4

Python says I need 4 bytes for a format code of "BH":
struct.error: unpack requires a string argument of length 4
Here is the code, I am putting in 3 bytes as I think is needed:
major, minor = struct.unpack("BH", self.fp.read(3))
"B" Unsigned char (1 byte) + "H" Unsigned short (2 bytes) = 3 bytes (!?)
struct.calcsize("BH") says 4 bytes.
EDIT: The file is ~800 MB and this is in the first few bytes of the file so I'm fairly certain there's data left to be read.
The struct module mimics C structures. It takes more CPU cycles for a processor to read a 16-bit word on an odd address or a 32-bit dword on an address not divisible by 4, so structures add "pad bytes" to make structure members fall on natural boundaries. Consider:
struct { 11
char a; 012345678901
short b; ------------
char c; axbbcxxxdddd
int d;
};
This structure will occupy 12 bytes of memory (x being pad bytes).
Python works similarly (see the struct documentation):
>>> import struct
>>> struct.pack('BHBL',1,2,3,4)
'\x01\x00\x02\x00\x03\x00\x00\x00\x04\x00\x00\x00'
>>> struct.calcsize('BHBL')
12
Compilers usually have a way of eliminating padding. In Python, any of =<>! will eliminate padding:
>>> struct.calcsize('=BHBL')
8
>>> struct.pack('=BHBL',1,2,3,4)
'\x01\x02\x00\x03\x04\x00\x00\x00'
Beware of letting struct handle padding. In C, these structures:
struct A { struct B {
short a; int a;
char b; char b;
}; };
are typically 4 and 8 bytes, respectively. The padding occurs at the end of the structure in case the structures are used in an array. This keeps the 'a' members aligned on correct boundaries for structures later in the array. Python's struct module does not pad at the end:
>>> struct.pack('LB',1,2)
'\x01\x00\x00\x00\x02'
>>> struct.pack('LBLB',1,2,3,4)
'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04'
By default, on many platforms the short will be aligned to an offset at a multiple of 2, so there will be a padding byte added after the char.
To disable this, use: struct.unpack("=BH", data). This will use standard alignment, which doesn't add padding:
>>> struct.calcsize('=BH')
3
The = character will use native byte ordering. You can also use < or > instead of = to force little-endian or big-endian byte ordering, respectively.

Categories