Creating 16 and 24-bit integers from binary file - python

I'm modifying an existing Python app that reads a binary file. The format of the file is changing a bit. Currently, one field is defined as bytes 35-36 of the record. The specs also state that "...fields in the records will be character fields written in ASCII." Here's what the current working code looks like:
def to_i16( word ):
xx = struct.unpack( '2c', word )
xx = ( ord( xx[ 0 ] ) << 8 ) + ord( xx[ 1 ] )
return xx
val = to_i16( reg[ 34:36 ] )
But that field is being redefined as a bytes 35-37, so it'll be a 24-bit value. I detest working with binary files and am horrible at bit-twiddling. How do I turn that 3-byte value into a 24-bit integer?? I've tried a couple of code bits that I've found by googling but I don't think they are correct. Hard to be sure since I'm still waiting on the people that sent the sample 'new format' file to send me a text representation that shows the values I should be coming up with.

Simply read 24 bit (I assume in big endian, since the original code is in that format as well ):
val = struct.unpack('>I', b'\x00' + reg[34:37])

Related

Reading binary file into different hex "types" (8bit, 16bit, 32bit, ...)

I have a file which contains binary data. The content of this file is just one long line.
Example: 010101000011101010101
Originaly the content was an array of c++ objects with the following data types:
// Care pseudo code, just for visualisation
int64 var1;
int32 var2[50];
int08 var3;
I want to skip var1 and var3 and only extract the values of var2 into some readable decimal values. My idea was to read the file byte by byte and convert them into hex values. In the next step I though I could "combine" (append) 4 of those hex values to get one int32 value.
Example: 0x10 0xAA 0x00 0x50 -> 0x10AA0050
My code so far:
def append_hex(a, b):
return (a << 4) | b
with open("file.dat", "rb") as f:
counter = 0
tickdifcounter = 0
current_byte=" "
while True:
if (counter >= 8) and (counter < 208):
tickdifcounter+=1
if (tickdifcounter <= 4):
current_byte = append_hex(current_byte, f.read(1))
if (not current_byte):
break
val = ord(current_byte)
if (tickdifcounter > 4):
print hex(val)
tickdifcounter = 0
current_byte=""
counter+=1
if(counter == 209): #209 bytes = int64 + (int32*50) + int08
counter = 0
print
Now I have the problem that my append_hex is not working because the variables are strings so the bitshift is not working.
I am new to python so please give me hints when I can do something in a better way.
You can use struct module for reading binary files.
This can help you Reading a binary file into a struct in Python
A character can be converted to a int using the ord(x) method. In order to get the integer value of a multi-byte number, bitshift left. For example, from a earlier project:
def parseNumber(string, index):
return ord(string[index])<<24 + ord(string[index+1])<<16 + \
ord(string[index+2])<<8+ord(string[index+3])
Note this code assumes big-endian system, you will need to reverse the index for parsing little-endian code.
If you know exaclty what the size of the struct is going to be, (or can easily calculate it based on the size of the file) you are probably better of using the "struct" module.

Python - Efficient way to flip bytes in a file?

I've got a folder full of very large files that need to be byte flipped by a power of 4. So essentially, I need to read the files as a binary, adjust the sequence of bits, and then write a new binary file with the bits adjusted.
In essence, what I'm trying to do is read a hex string hexString that looks like this:
"00112233AABBCCDD"
And write a file that looks like this:
"33221100DDCCBBAA"
(i.e. every two characters is a byte, and I need to flip the bytes by a power of 4)
I am very new to python and coding in general, and the way I am currently accomplishing this task is extremely inefficient. My code currently looks like this:
import binascii
with open(myFile, 'rb') as f:
content = f.read()
hexString = str(binascii.hexlify(content))
flippedBytes = ""
inc = 0
while inc < len(hexString):
flippedBytes += file[inc + 6:inc + 8]
flippedBytes += file[inc + 4:inc + 6]
flippedBytes += file[inc + 2:inc + 4]
flippedBytes += file[inc:inc + 2]
inc += 8
..... write the flippedBytes to file, etc
The code I pasted above accurately accomplishes what I need (note, my actual code has a few extra lines of: "hexString.replace()" to remove unnecessary hex characters - but I've left those out to make the above easier to read). My ultimate problem is that it takes EXTREMELY long to run my code with larger files. Some of my files I need to flip are almost 2gb in size, and the code was going to take almost half a day to complete one single file. I've got dozens of files I need to run this on, so that timeframe simply isn't practical.
Is there a more efficient way to flip the HEX values in a file by a power of 4?
.... for what it's worth, there is a tool called WinHEX that can do this manually, and only takes a minute max to flip the whole file.... I was just hoping to automate this with python so we didn't have to manually use WinHEX each time
You want to convert your 4-byte integers from little-endian to big-endian, or vice-versa. You can use the struct module for that:
import struct
with open(myfile, 'rb') as infile, open(myoutput, 'wb') as of:
while True:
d = infile.read(4)
if not d:
break
le = struct.unpack('<I', d)
be = struct.pack('>I', *le)
of.write(be)
Here is a little struct awesomeness to get you started:
>>> import struct
>>> s = b'\x00\x11\x22\x33\xAA\xBB\xCC\xDD'
>>> a, b = struct.unpack('<II', s)
>>> s = struct.pack('>II', a, b)
>>> ''.join([format(x, '02x') for x in s])
'33221100ddccbbaa'
To do this at full speed for a large input, use struct.iter_unpack

Decoding Base64 string

I'm working with Python to do some string decoding and I am trying to understand what does this line of code...
for irradiance_data in struct.iter_unpack("qHHHHfff", irradiance_list_bytes):
print(irradiance_data)
In my case irradiance_list_bytes is something like this
"\xf5R\x960\x00\x00\x00\x009\x0f\xb4\x03\x01\x00d\x00\xa7D\xd1BC\x8c\x9d\xc2\xb3\xa5\xf0\xc0\xaer\x990\x00\x00\x00\x000\x0f\xb2\x03\x01\x00d\x00\x8f+\xd1B\x81\x9c\x9d\xc2\xf7\xfb\xe6\xc0u\x96\x9c0\x00\x00\x00\x00.\x0f\xb1\x03\x01\x00d\x00\xfe\x81\xd3B\x8a\r\x9e\xc2\xb4\xe7\x01\xc1\x1a\x7f\x9f0\x00\x00\x00\x00*\x0f\xb0\x03\x01\x00d\x00Z\xf5\xd3B\xedq\x9e\xc2&\xa1\x03\xc1\x94\x82\xa20\x00\x00\x00\x00-\x0f\xb1\x03\x01\x00d\x00\xb6\x8f\xd3Bg\xdf\x9d\xc2\x00\xad\xfd\xc0#\x93\xa50\x00\x00\x00\x000\x0f\xb2\x03\x01\x00d\x00\x95n\xd4B\x1d'\x9e\xc2\x1dW\x01\xc1\xd3\xa1\xa80\x00\x00\x00\x001\x0f\xb2\x03\x01\x00d\x00\x1d\xbc\xd3B\xeb\xca\x9d\xc2s\xbf\xf2\xc0.\xaf\xab0\x00\x00\x00\x001\x0f\xb2\x03\x01\x00d\x00\x13\xad\xd4BJx\x9d\xc2G(\xfb\xc0.\xc2\xae0\x00\x00\x00\x007\x0f\xb4\x03\x01\x00d\x00\xd1\xc9\xd4BS\xb8\x9d\xc2\xf0\xd9\xf8\xc0"
And the message error is
AttributeError: 'module' object has no attribute 'iter_unpack'
I beleive that, I have to change "qHHHHfff" to another string format, but I don't understand ?
The complete code is here...
import os
import glob
import exiftool
import base64
import struct
irradiance_list_tag = 'XMP:IrradianceList'
irradiance_calibration_measurement_golden_tag = 'XMP:IrradianceCalibrationMeasurementGolden'
irradiance_calibration_measurement_tag = 'XMP:IrradianceCalibrationMeasurement'
tags = [ irradiance_list_tag, irradiance_calibration_measurement_tag ]
directory = '/home/stagiaire/Bureau/AAAA/'
channels = [ 'RED' ]
index = 0
for channel in channels:
files = glob.glob(os.path.join(directory, '*' + channel + '*'))
with exiftool.ExifTool() as et:
metadata = et.get_tags_batch(tags, files)
for file_metadata in metadata:
irradiance_list = file_metadata[irradiance_list_tag]
irradiance_calibration_measurement = file_metadata[irradiance_calibration_measurement_tag]
irradiance_list_bytes = base64.b64decode(irradiance_list)
print(files[index])
index += 1
for irradiance_data in struct.iter_unpack("qHHHHfff", irradiance_list_bytes):
print(irradiance_data)
EDIT
So a stated by Strubbly, this is the solution for this question.
print struct.unpack("I",x[:4])
for i in range(8):
start = 4 + i*28
print struct.unpack("qHHHHfff",x[start:start+28])
struct.iter_unpack is only available in Python 3 and you are using Python 2.
There is no direct equivalent. struct.unpack will unpack one lump of 28 bytes (with that format string). struct.iter_unpack will unpack multiples of 28 bytes in Python 3.
If your data was suitable for struct.iter_unpack with that format code then you could do something like this:
for i in range(0,len(x),28):
print struct.unpack("qHHHHfff",x[i:i+28])
Unfortunately your sample data is not a multiple of 28 bytes long and so I would expect an error in Python 3 as well.
Without knowing about your data it is hard to correct your code but, at a guess, you data might have 4 bytes of some other data at the front. So that could be unpacked with something like this:
print struct.unpack("I",x[:4])
for i in range(8):
start = 4 + i*28
print struct.unpack("qHHHHfff",x[start:start+28])
In this example I have guessed that the first four bytes are an unsigned int but I have no way of knowing if that is correct. More information is needed.

Reading A Binary File In Fortran That Was Created By A Python Code

I have a binary file that was created using a Python code. This code mainly scripts a bunch of tasks to pre-process a set of data files. I would now like to read this binary file in Fortran. The content of the binary file is coordinates of points in a simple format e.g.: number of points, x0, y0, z0, x1, y1, z1, ....
These binary files were created using the 'tofile' function in numpy. I have the following code in Fortran so far:
integer:: intValue
double precision:: dblValue
integer:: counter
integer:: check
open(unit=10, file='file.bin', form='unformatted', status='old', access='stream')
counter = 1
do
if ( counter == 1 ) then
read(unit=10, iostat=check) intValue
if ( check < 0 ) then
print*,"End Of File"
stop
else if ( check > 0 ) then
print*, "Error Detected"
stop
else if ( check == 0 ) then
counter = counter + 1
print*, intValue
end if
else if ( counter > 1 ) then
read(unit=10, iostat=check) dblValue
if ( check < 0 ) then
print*,"End Of File"
stop
else if ( check > 0 ) then
print*, "Error Detected"
stop
else if ( check == 0 ) then
counter = counter + 1
print*,dblValue
end if
end if
end do
close(unit=10)
This unfortunately does not work, and I get garbage numbers (e.g 6.4731191026611484E+212, 2.2844499004808491E-279 etc.). Could someone give some pointers on how to do this correctly?
Also what would be a good way of writing and reading binary files interchangeably between Python and Fortran - as it seems like that is going to be one of the requirements of my application.
Thanks
Here's a trivial example of how to take data generated with numpy to Fortran the binary way.
I calculated 360 values of sin on [0,2π),
#!/usr/bin/env python3
import numpy as np
with open('sin.dat', 'wb') as outfile:
np.sin(np.arange(0., 2*np.pi, np.pi/180.,
dtype=np.float32)).tofile(outfile)
exported that with tofile to binary file 'sin.dat', which has a size of 1440 bytes (360 * sizeof(float32)), read that file with this Fortran95 (gfortran -O3 -Wall -pedantic) program which outputs 1. - (val**2 + cos(x)**2) for x in [0,2π),
program numpy_import
integer, parameter :: REAL_KIND = 4
integer, parameter :: UNIT = 10
integer, parameter :: SAMPLE_LENGTH = 360
real(REAL_KIND), parameter :: PI = acos(-1.)
real(REAL_KIND), parameter :: DPHI = PI/180.
real(REAL_KIND), dimension(0:SAMPLE_LENGTH-1) :: arr
real(REAL_KIND) :: r
integer :: i
open(UNIT, file="sin.dat", form='unformatted',&
access='direct', recl=4)
do i = 0,ubound(arr, 1)
read(UNIT, rec=i+1, err=100) arr(i)
end do
do i = 0,ubound(arr, 1)
r = 1. - (arr(i)**2. + cos(real(i*DPHI, REAL_KIND))**2)
write(*, '(F6.4, " ")', advance='no')&
real(int(r*1E6+1)/1E6, REAL_KIND)
end do
100 close(UNIT)
write(*,*)
end program numpy_import
thus if val == sin(x), the numeric result must in good approximation vanish for float32 types.
And indeed:
output:
360 x 0.0000
So thanks to this great community, from all the advise I got, and a little bit of tinkering around, I think I figured out a stable solution to this problem, and I wanted to share with you all this answer. I will provide a minimal example here, where I want to write a variable size array from Python into a binary file, and read it using Fortran. I am assuming that the number of rows numRows and number of columns numCols are also written along with the full array datatArray. The following Python script writeBin.py writes the file:
import numpy as np
# Read in the numRows and numCols value
# Read in the array values
numRowArr = np.array([numRows], dtype=np.float32)
numColArr = np.array([numCols], dtype=np.float32)
fileObj = open('pybin.bin', 'wb')
numRowArr.tofile(fileObj)
numColArr.tofile(fileObj)
for i in range(numRows):
lineArr = dataArray[i,:]
lineArr.tofile(fileObj)
fileObj.close()
Following this, the fortran code to read the array from the file can be programmed as follows:
program readBin
use iso_fortran_env
implicit none
integer:: nR, nC, i
real(kind=real32):: numRowVal, numColVal
real(kind=real32), dimension(:), allocatable:: rowData
real(kind=real32), dimension(:,:), allocatable:: fullData
open(unit=10,file='pybin.bin',form='unformatted',status='old',access='stream')
read(unit=10) numRowVal
nR = int(numRowVal)
read(unit=10) numColVal
nC = int(numColVal)
allocate(rowData(nC))
allocate(fullData(nR,nC))
do i = 1, nR
read(unit=10) rowData
fullData(i,:) = rowData(:)
end do
close(unit=10)
end program readBin
The main point that I gathered from the discussion on this thread is to match the read and the write as much as possible, with precise specifications of the data types to be read, the way they are written etc. As you may note, this is a made up example, so there may be some things here and there that are not perfect. However, I have used this now to program a finite element program, and the mesh data was where I used this binary read/write - and it worked very well.
P.S: In case you find some typo, please let me know, and I will edit it out rightaway.
Thanks a lot.

python binary file operation

I am getting trouble working with binary file in python.
Here is what I want to do:
I have a binary file in which I want to modify a sequence by another.
The sequence to replace is 'ASBF' .
And I want to replace it by a number.
It worked just fine when I used python 2.7.
But now in python 3.3 there is a difference between bytes and str and I think it is my problem.
Here was the code I was doing:
#that is the number I want to put instead of the hexa sequence
number = 1703518678
#I put it in an array
number_array = []
number_array.append(number & 0xFF)
number_array.append(number >> 8 & 0xFF)
number_array.append(number >> 16 & 0xFF)
number_array.append(number >> 24 & 0xFF)
f = open(fichier_bin, 'rb')
lines = f.readlines()
f.close()
f = open(fichier_bin, 'wb')
for line in lines:
f.write(line.replace('ASBF', struct.pack('BBBB', number_array[0], number_array[1], number_array[2], number_array[3]))) #replace ASBF by the number
f.close()
I tried other things to workaround the problem but I can't figure out how to replace a sequence by another in a binary file.
I would like 41534246 which is 'ASBF' in hexa to become 6589A1D6 which is 1703518678 in hexa.
EDIT:
And here is the error I get
f.write(ligne.replace('ASBF', struct.pack('BBBB', number_array[0], number_array[1], number_array[2], number_array[3]))) #replace ASBF by the number
TypeError: expected bytes, bytearray or buffer compatible object
And I really do not understand how to get through this.
EDIT2:
The problem I got was from the way I opened the file.
Now that I use with instead of just open, my program is working just fine.

Categories