python convert raw binary data to array of floats - python

I'm new to python but i need to get this project done in it. I'm using telnetlib to get some raw data from a device, and this is what the data looks like (this is only part of the output i get, the real one is about 10x bigger)
\xc2\xb2\xdd\x0f\xc2\xb2x/\xc2\xb2\x08\xb2M\xcf\xc2\xb2\xc5S\xc2\xb2\xd6[\xc2\xb2qw\xc2\xb1\xafK\xc2\xb1n+\xc2\xb2?\x83\xc2\xb1\xe3\xb7\xc2\xb0\xe8\x87\xc2\xb0\xf1\x8f\xc2\xb1x\xbf\xc2\xb1\xcbO\xc2\xb1\x98\x93\xc2\xb1\xd4\xc3\xc2\xb1\xf7\x9f\xc2\xb1\xb3\x97\xc2\xb1\xe7;\xc2\xb2\x97\xcb\xc2\xb2\xd3\xf3\xc2\xb2f\x8b\xc2\xb1\xc6\xdb\xc2\xb1\xadC\xc2\xb1t\xcf\xc2\xb1\x9c\xdf\xc2\xb1\xb7\x1b\xc2\xb1\xa3\xc2\xb1\t_\xc2\xb1v\xc3\xc2\xb1\xeb
The documentation of the device says that this is
raw data: binary. An array of float values in big-endian format (not as a string).
The question is how can i convert this data into an array of float numbers?
the code:
import telnetlib
tn = telnetlib.Telnet(hostIP)
tn.read_until("connected")
tn.write("getData\r\n")
data = tn.read_until("\r\n")
print data
When i execute this script from terminal i get some binary "garbage"
²\f²▒▒²▒V²▒²▒
³▒▒³u▒³:v³▒>³;>²W▒²O^²Xf²▒▒±▒▒²P▒²▒j²▒²▒³Pv³▒▒²▒n²:Z²▒±▒F±▒±7▒±#▒±t^±▒▒±▒▒²5:±▒"±▒~±ю±±*±▒°▒▒°{n°a▒°▒:°Q▒°[°cj°0▒¯▒▒¯▒▒r¯ޒ°▒°▒¯▒▒¯a▒¯▒°E▒°▒r°q*¯▒¯▒
If i do the same from python shell i get the \xc2\xb2\xdd\x0f\xc2... values

You need to know in advance the number of elements in the array, or somehow infer the count, ie by counting the number of bytes and then dividing by the float size. You then use the struct module to unpack the binary data.
if (len(data) % 8) > 0:
assert "Data length not a multiple of 8"
L = []
for i in range(0, len(data), 8):
L.append(struct.unpack('>d', data[i:i+8]))

Complementing #vz0 answer, there is also struct.iter_unpack() that:
Iteratively unpack from the buffer buffer according to the format string format.
read the docs here
So we can convert without any trouble:
import struct
import numpy as np
# Choose operators from https://docs.python.org/3/library/struct.html#format-strings
Byte_Order = '<' # little-endian
Format_Characters = 'f' # float (4 bytes)
data_format = Byte_Order + Format_Characters
r = np.array(list(struct.iter_unpack(data_format, data)), dtype=float)

Related

Python f.read() and Octave fread(). => Reading a binary file showing the same values

I'm reading a binary file with signal samples both in Octave and Python.
The thing is, I want to obtain the same values for both codes, which is not the case.
The binary file is basically a signal in complex format I,Q recorded as a 16bits Int.
So, based on the Octave code:
[data, cnt_data] = fread(fid, 2 * secondOfData * fs, 'int16');
and then:
data = data(1:2:end) + 1i * data(2:2:end);
It seems simple, just reading the binary data as 16 bits ints. And then creating the final array of complex numbers.
Threfore I assume that in Python I need to do as follows:
rel=int(f.read(2).encode("hex"),16)
img=int(f.read(2).encode("hex"),16)
in_clean.append(complex(rel,img))
Ok, the main problem I have is that both real and imaginary parts values are not the same.
For instance, in Octave, the first value is: -20390 - 10053i
While in Python (applying the code above), the value is: (23216+48088j)
As signs are different, the first thing I thought was that maybe the endianness of the computer that recorded the file and the one I'm using for reading the file are different. So I turned to unpack function, as it allows you to force the endian type.
I was not able to find an "int16" in the unpack documentation:
https://docs.python.org/2/library/struct.html
Therefore I went for the "i" option adding "x" (padding bytes) in order to meet the requirement of 32 bits from the table in the "struct" documentation.
So with:
struct.unpack("i","xx"+f.read(2))[0]
the result is (-1336248200-658802568j) Using
struct.unpack("<i","xx"+f.read(2))[0] provides the same result.
With:
struct.unpack(">i","xx"+f.read(2))[0]
The value is: (2021153456+2021178328j)
With:
struct.unpack(">i",f.read(2)+"xx")[0]
The value is: (1521514616-1143441288j)
With:
struct.unpack("<i",f.read(2)+"xx")[0]
The value is: (2021175386+2021185723j)
I also tried with numpy and "frombuffer":
np.frombuffer(f.read(1).encode("hex"),dtype=np.int16)
With provides: (24885+12386j)
So, any idea about what I'm doing wrong? I'd like to obtain the same value as in Octave.
What is the proper way of reading and interpreting the values in Python so I can obtain the same value as in Octave by applying fread with an'int16'?
I've been searching on the Internet for an answer for this but I was not able to find a method that provides the same value
Thanks a lot
Best regards
It looks like the binary data in your question is 5ab0bbd8. To unpack signed 16 bit integers with struct.unpack, you use the 'h' format character. From that (23216+48088j) output, it appears that the data is encoded as little-endian, so we need to use < as the first item in the format string.
from struct import unpack
data = b'\x5a\xb0\xbb\xd8'
# The wrong way
rel=int(data[:2].encode("hex"),16)
img=int(data[2:].encode("hex"),16)
c = complex(rel, img)
print c
# The right way
rel, img = unpack('<hh', data)
c = complex(rel, img)
print c
output
(23216+48088j)
(-20390-10053j)
Note that rel, img = unpack('<hh', data) will also work correctly on Python 3.
FWIW, in Python 3, you could also decode 2 bytes to a signed integer like this:
def int16_bytes_to_int(b):
n = int.from_bytes(b, 'little')
if n > 0x7fff:
n -= 0x10000
return n
The rough equivalent in Python 2 is:
def int16_bytes_to_int(b):
lo, hi = b
n = (ord(hi) << 8) + ord(lo)
if n > 0x7fff:
n -= 0x10000
return n
But having to do that subtraction to handle signed numbers is annoying, and using struct.unpack is bound to be much more efficient.

Porting struct.unpack from python 2.7 to 3

The following code works fine in python 2.7:
def GetMaxNoise(data, max_noise):
for byte in data:
noise = ComputeNoise(struct.unpack('=B',byte)[0])
if max_noise < noise:
max_noise = noise
return max_noise
where data is a string holding binary data (taken from a network packet).
I'm trying to port it to Python 3 and I get this:
File "Desktop/Test.py", line 2374, in GetMaxNoise
noise = ComputeNoise(struct.unpack('=B',byte)[0])
TypeError: 'int' does not support the buffer interface
How can I convert "data" to the appropriate type needed by unpack()?
Assuming the data variable is a string of bytes that you got from a binary file on a network packet, it it not processed the same in Python2 and Python3.
In Python2, it is a string. When you iterate its values, you get single byte strings, that you convert to int with struct.unpack('=B')[0]
In Python3, it is a bytes object. When you iterate its values, you directly get integers! So you should directly use:
def GetMaxNoise(data, max_noise):
for byte in data:
noise = ComputeNoise(byte) # byte is already the int value of the byte...
if max_noise < noise:
max_noise = noise
return max_noise
From the docs of the struct module https://docs.python.org/3.4/library/struct.html I see that the unpack method expects it's second argument to implement Buffer Protocol, so it generally expects bytes.
Your data object seems to be of the type bytes as it's read from somewhere. When you iterate over it with the for loop, you end up with byte variable being single int values.
I don't know what your code is supposed to do and how, but maybe change the way you iterate over your data object to handle not ints but bytes of length == 1?
for i in range(len(data)):
byte = data[i:i+1]
print(byte)

simple reading of fortran binary data not so simple in python

I have a binary output file from a FORTRAN code. Want to read it in python. (Reading with FORTRAN and outputting text to read for python is not an option. Long story.) I can read the first record in a simplistic manner:
>>> binfile=open('myfile','rb')
>>> pad1=struct.unpack('i',binfile.read(4))[0]
>>> ver=struct.unpack('d',binfile.read(8))[0]
>>> pad2=struct.unpack('i',binfile.read(4))[0]
>>> pad1,ver,pad2
(8,3.13,8)
Just fine. But this is a big file and I need to do this more efficiently. So I try:
>>> (pad1,ver,pad2)=struct.unpack('idi',binfile.read(16))
This won't run. Gives me an error and tells me that unpack needs an argument with a length of 20. This makes no sense to me since the last time I checked, 4+8+4=16. When I give in and replace the 16 with 20, it runs, but the three numbers are populated with numerical junk. Does anyone see what I am doing wrong? Thanks!
The size you get is due to alignment, try struct.calcsize('idi') to verify the size is actually 20 after alignment. To use the native byte-order without alignment, specify struct.calcsize('=idi') and adapt it to your example.
For more info on the struct module, check http://docs.python.org/2/library/struct.html
The struct module is mainly intended to interoperate with C structures and because of this it aligns the data members. idi corresponds to the following C structure:
struct
{
int int1;
double double1;
int int2;
}
double entries require 8 byte alignment in order to function efficiently (or even correctly) with most CPU load operations. That's why 4 bytes of padding are being added between int1 and double1, which increases the size of the structure to 20 bytes. The same padding is performed by the struct module, unless you suppress the padding by adding < (on little endian machines) or > (on big endian machines), or simply = at the beginning of the format string:
>>> struct.unpack('idi', d)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
struct.error: unpack requires a string argument of length 20
>>> struct.unpack('<idi', d)
(-1345385859, 2038.0682530887993, 428226400)
>>> struct.unpack('=idi', d)
(-1345385859, 2038.0682530887993, 428226400)
(d is a string of 16 random chars.)
I recommend using arrays to read a file that was written by FORTRAN with UNFORMATTED, SEQUENTIAL.
Your specific example using arrays, would be as follows:
import array
binfile=open('myfile','rb')
pad = array.array('i')
ver = array.array('d')
pad.fromfile(binfile,1) # read the length of the record
ver.fromfile(binfile,1) # read the actual data written by FORTRAN
pad.fromfile(binfile,1) # read the length of the record
If you have FORTRAN records that write arrays of integers and doubles, which is very common, your python would look something like this:
import array
binfile=open('myfile','rb')
pad = array.array('i')
my_integers = array.array('i')
my_floats = array.array('d')
number_of_integers = 1000 # replace with how many you need to read
number_of_floats = 10000 # replace with how many you need to read
pad.fromfile(binfile,1) # read the length of the record
my_integers.fromfile(binfile,number_of_integers) # read the integer data
my_floats.fromfile(binfile,number_of_floats) # read the double data
pad.fromfile(binfile,1) # read the length of the record
Final comment is that if you have characters on the file, you can read those into an array as well, and then decode it into a string. Something like this:
import array
binfile=open('myfile','rb')
pad = array.array('i')
my_characters = array.array('B')
number_of_characters = 63 # replace with number of characters to read
pad.fromfile(binfile,1) # read the length of the record
my_characters.fromfile(binfile,number_of_characters ) # read the data
my_string = my_characters.tobytes().decode(encoding='utf_8')
pad.fromfile(binfile,1) # read the length of the record

Storing 'struct' data to binary file

I need to store a binary file with a 12 byte header composed of 4 fields. They are namely: sSamples (4-bytes integer), sSampPeriod (4-bytes integer), sSampSize (2-bytes integer), and finally sParmKind (2-bytes integer).
I'm using 'struct' to my variables to the desired fields. Now that I have them defined separately, how could I merge them all to store the '12 bytes header'?
sSamples = struct.pack('i', nSamples) # 4-bytes integer
sSampPeriod = struct.pack('i', nSampPeriod) # 4-bytes integer
sSampSize = struct.pack('H', nSampSize) # 2-bytes integer / unsigned short
sParmKind = struct.pack('H', 9) # 2-bytes integer / unsigned short
In addition, I've a npVect float array of dimensionality D (numpy.ndarray - float32). How could I store this vector in the same binary file, but after the header?
As Cody Brocious wrote, you can pack your entire header at once:
header = struct.pack('<iiHH', nSamples, nSampPeriod, nSampSize, nParmKind)
He also mentioned endianness, which is important if you want to pack your data so as to reliably unpack it on machines with different architectures. The < at the beginning of my format string specifies "pack this data using a little-endian convention".
As for the array, you'll have to pack its length in order to determine how many values to unpack when you read it again. Doing it all in one call:
flattened = npVect.ravel() # get a 1-D array of numbers
arrSize = len(flattened)
# pack header, count of numbers, and numbers, all in one call
packed = struct.pack('<iiHHi%df' % arrSize,
nSamples, nSampPeriod, nSampSize, nParmKind, arrSize, *flattened)
Depending on how big your array is likely to be, you could end up with a huge string representing the entire contents of your binary file, and you might want to look into alternatives to struct which don't require you to have the entire file in memory.
Unpacking:
fmt = '<iiHHi'
nSamples, nSampPeriod, nSampSize, nParmKind, arrSize = struct.unpack(fmt, packed)
# Use unpack_from to start reading after the packed header and count
flattened = struct.unpack_from('<%df' % arrSize, packed, struct.calcsize(fmt))
npVect = np.ndarray(flattened, dtype='float32').reshape(# your dimensions go here
)
EDIT: Oops, the array format isn't quite as simple as that :) The general idea holds, though: flatten your array into a list of numbers using any method you like, pack the number of values, then pack each value. On the other side, read the array as a flat list, then impose whatever structure you need on it.
EDIT: Changed format strings to use repeat specifiers, rather than string multiplication. Thanks to John Machin for pointing it out.
EDIT: Added numpy code to flatten the array before packing and reconstruct it after unpacking.
struct.pack returns a string, so you can combine the fields simply by string concatenation:
header = sSamples + sSampPeriod + sSampSize + sParmKind
assert len( header ) == 12

Convert little endian string to integer [duplicate]

This question already has answers here:
How to convert a string of bytes into an int?
(12 answers)
Closed 7 months ago.
I have read samples out of a wave file using the wave module, but it gives the samples as a string, it's out of wave so it's little endian (for example, \x00).
What is the easiest way to convert this into a python integer, or a numpy.int16 type? (It will eventually become a numpy.int16, so going directly there is fine).
Code needs to work on little endian and big endian processors.
The struct module converts packed data to Python values, and vice-versa.
>>> import struct
>>> struct.unpack("<h", "\x00\x05")
(1280,)
>>> struct.unpack("<h", "\x00\x06")
(1536,)
>>> struct.unpack("<h", "\x01\x06")
(1537,)
"h" means a short int, or 16-bit int. "<" means use little-endian.
struct is fine if you have to convert one or a small number of 2-byte strings to integers, but array and numpy itself are better options. Specifically, numpy.fromstring (called with the appropriate dtype argument) can directly convert the bytes from your string to an array of (whatever that dtype is). (If numpy.little_endian is false, you'll then have to swap the bytes -- see here for more discussion, but basically you'll want to call the byteswap method on the array object you just built with fromstring).
Kevin Burke's answer to this question works great when your binary string represents a single short integer, but if your string holds binary data representing multiple integers, you will need to add an additional 'h' for each additional integer that the string represents.
For Python 2
Convert Little Endian String that represents 2 integers
import struct
iValues = struct.unpack("<hh", "\x00\x04\x01\x05")
print(iValues)
Output: (1024, 1281)
Convert Little Endian String that represents 3 integers
import struct
iValues = struct.unpack("<hhh", "\x00\x04\x01\x05\x03\x04")
print(iValues)
Output: (1024, 1281, 1027)
Obviously, it's not realistic to always guess how many "h" characters are needed, so:
import struct
# A string that holds some unknown quantity of integers in binary form
strBinary_Values = "\x00\x04\x01\x05\x03\x04"
# Calculate the number of integers that are represented by binary string data
iQty_of_Values = len(strBinary_Values)/2
# Produce the string of required "h" values
h = "h" * int(iQty_of_Values)
iValues = struct.unpack("<"+h, strBinary_Values)
print(iValues)
Output: (1024, 1281, 1027)
For Python 3
import struct
# A string that holds some unknown quantity of integers in binary form
strBinary_Values = "\x00\x04\x01\x05\x03\x04"
# Calculate the number of integers that are represented by binary string data
iQty_of_Values = len(strBinary_Values)/2
# Produce the string of required "h" values
h = "h" * int(iQty_of_Values)
iValues = struct.unpack("<"+h, bytes(strBinary_Values, "utf8"))
print(iValues)
Output: (1024, 1281, 1027)
int(value[::-1].hex(), 16)
By example:
value = b'\xfd\xff\x00\x00\x00\x00\x00\x00'
print(int(value[::-1].hex(), 16))
65533
[::-1] invert the values (little endian), .hex() trabnsform to hex literal, int(,16) transform from hex literal to int base16.

Categories