I need to extract some number values out of a binary data stream.
the code below is working for me, but for sure there is a more suitable way to do this in python. Especially I was struggling a lot to find a better way to iterate over the array and get 4 byte as byte arrays from the buffer.
some hint for me?
outfile = io.BytesIO()
outfile.writelines(some binary data stream)
buf = outfile.getvalue()
blen = int(len(buf) / 4 );
for i in range(blen):
a = bytearray([0,0,0,0])
a[0] = buf[i*4]
a[1] = buf[i*4+1]
a[2] = buf[i*4+2]
a[3] = buf[i*4+3]
data = struct.unpack('<l', a)[0]
do something with data
Your question and accompanying pseudo-code are somewhat hazy in my opinion, but here's something that uses slices of buf to obtain the each group of 4 bytes needed—so if nothing else it's at least a bit more succinct (assuming I've correctly interpreted what you're asking):
import io
import struct
outfile = io.BytesIO()
buf = outfile.getvalue()
for i in range(0, len(buf), 4):
data = struct.unpack('<l', buf[i:i+4])[0]
I saved one data set(200 double data values) from Keil, it turns to be a .hex file with IntelHex format, I installed IntelHex in python and load it. Now the problem is I do not know how to interpret it, for example, this post
tells you to use dict, but it does not work for hex file including double data. my code:
from intelhex import IntelHex
ih = IntelHex() # create empty object
ihdict = ih.todict()
datastr = ""
startAddress = 536871952
while ihdict.get(startAddress) != None:
datastr += str("%0.2X" %ihdict.get(startAddress))
startAddress += 1
the file output.hex:
Assuming the data represents a list of 64-bit floating point numbers that you want to decode, the process is to collect the appropriate number of octets and decode them as a double.
Reusing the structure you presented:
from intelhex import IntelHex
import struct
ih = IntelHex()
ihdict = ih.todict()
# Read all the data into a long list of int octets
data = []
startAddress = 536871952
while ihdict.get(startAddress) is not None:
startAddress += 1
# slice the list into 8-byte bytearrays
bin_arr = [bytearray(data[n:n+8]) for n in range(0, len(data), 8)]
# unpack each bytearray as a double
# Filter for 8 byte arrays because len(data) is not divisible by 8.
# Is the data properly aligned?
doubles_list = [struct.unpack('d', b) for b in bin_arr if len(b) == 8]
It may be worth mentioning that the above assumes a big endian byte ordering. I believe you can use < as part of the format definition to assume a little endian ordering. More information is available in the struct.unpack docs.
I am trying to read bytes from an image, and get all the int (16 bit) values from that image.
After I parsed the image header, I got to the pixel values. The values that I get when the pair of bytes are like b"\xd4\x00" is incorrect. In this case it should be 54272, not 3392.
This are parts of the code:
I use a generator to get the bytes:
import itertools
def osddef_generator(in_file):
with open(in_file, mode='rb') as f:
dat = f.read()
for byte in dat:
yield byte
def take_slice(in_generator, size):
return ''.join(str(chr(i)) for i in itertools.islice(in_generator, size))
def take_single_pixel(in_generator):
pix = itertools.islice(in_generator, 2)
hex_list = [hex(i) for i in pix]
hex_str = "".join(hex_list)[2:].replace("0x", '')
intval = int(hex_str, 16)
print("hex_list: ", hex_list)
print("hex_str: ", hex_str)
print("intval: ", intval)
After I get the header correctly using the take_slice method, I get to the part with the pixel values, where I use the take_single_pixel method.
Here, I get the bad results.
This is what I get:
hex_list: ['0xd4', '0x0']
hex_str: d40
intval: 3392
But the actual sequence of bytes that should be interpreted is: \xd4\x00, which equals to 54272, so that my hex_list = ['0xd4', '0x00'] and hex_str = d400.
Something happens when I have a sequence of bytes when the second one is \x00.
Got any ideas? Thanks!
There are much better ways of converting bytes to integters:
int.from_bytes() takes bytes input, and a byte order argument:
>>> int.from_bytes(b"\xd4\x00", 'big')
>>> int.from_bytes(b"\xd4\x00", 'little')
The struct.unpack() function lets you convert a whole series of bytes to integers following a pattern:
>>> import struct
>>> struct.unpack('!4H', b'\xd4\x00\xd4\x00\xd4\x00\xd4\x00')
(54272, 54272, 54272, 54272)
The array module lets you read binary data representing homogenous integer data into a memory structure efficiently:
>>> array.array('H', fileobject)
However, array can't be told what byte order to use. You'd have to determine the current architecture byte order and call arr.byteswap() to reverse order if the machine order doesn't match the file order.
When reading image data, it is almost always preferable to use the struct module to do the parsing. You generally then use file.read() calls with specific sizes; if the header consists of 10 bytes, use:
headerinfo = struct.unpack('<expected header pattern for 10 bytes>', f.read(10))
and go from there. For examples, look at the Pillow / PIL image plugins source code; here is how the Blizzard Mipmap image format header is read:
def _read_blp_header(self):
self._blp_compression, = struct.unpack("<i", self.fp.read(4))
self._blp_encoding, = struct.unpack("<b", self.fp.read(1))
self._blp_alpha_depth, = struct.unpack("<b", self.fp.read(1))
self._blp_alpha_encoding, = struct.unpack("<b", self.fp.read(1))
self._blp_mips, = struct.unpack("<b", self.fp.read(1))
self._size = struct.unpack("<II", self.fp.read(8))
if self.magic == b"BLP1":
# Only present for BLP1
self._blp_encoding, = struct.unpack("<i", self.fp.read(4))
self._blp_subtype, = struct.unpack("<i", self.fp.read(4))
self._blp_offsets = struct.unpack("<16I", self.fp.read(16 * 4))
self._blp_lengths = struct.unpack("<16I", self.fp.read(16 * 4))
Because struct.unpack() always returns tuples, you can assign individual elements in a tuple to name1, name2, ... names on the left-hand size, including single_name, = assignments to extract a single result.
The separate set of read calls above could also be compressed into fewer calls:
comp, enc, adepth, aenc, mips, *size = struct.unpack("<i4b2I", self.fp.read(16))
if self.magic == b"BLP1":
# Only present for BLP1
enc, subtype = struct.unpack("<2i", self.fp.read(8))
followed by specific attribute assignments.
I've got a folder full of very large files that need to be byte flipped by a power of 4. So essentially, I need to read the files as a binary, adjust the sequence of bits, and then write a new binary file with the bits adjusted.
In essence, what I'm trying to do is read a hex string hexString that looks like this:
And write a file that looks like this:
(i.e. every two characters is a byte, and I need to flip the bytes by a power of 4)
I am very new to python and coding in general, and the way I am currently accomplishing this task is extremely inefficient. My code currently looks like this:
import binascii
with open(myFile, 'rb') as f:
content = f.read()
hexString = str(binascii.hexlify(content))
flippedBytes = ""
inc = 0
while inc < len(hexString):
flippedBytes += file[inc + 6:inc + 8]
flippedBytes += file[inc + 4:inc + 6]
flippedBytes += file[inc + 2:inc + 4]
flippedBytes += file[inc:inc + 2]
inc += 8
..... write the flippedBytes to file, etc
The code I pasted above accurately accomplishes what I need (note, my actual code has a few extra lines of: "hexString.replace()" to remove unnecessary hex characters - but I've left those out to make the above easier to read). My ultimate problem is that it takes EXTREMELY long to run my code with larger files. Some of my files I need to flip are almost 2gb in size, and the code was going to take almost half a day to complete one single file. I've got dozens of files I need to run this on, so that timeframe simply isn't practical.
Is there a more efficient way to flip the HEX values in a file by a power of 4?
.... for what it's worth, there is a tool called WinHEX that can do this manually, and only takes a minute max to flip the whole file.... I was just hoping to automate this with python so we didn't have to manually use WinHEX each time
You want to convert your 4-byte integers from little-endian to big-endian, or vice-versa. You can use the struct module for that:
import struct
with open(myfile, 'rb') as infile, open(myoutput, 'wb') as of:
while True:
d = infile.read(4)
if not d:
le = struct.unpack('<I', d)
be = struct.pack('>I', *le)
Here is a little struct awesomeness to get you started:
>>> import struct
>>> s = b'\x00\x11\x22\x33\xAA\xBB\xCC\xDD'
>>> a, b = struct.unpack('<II', s)
>>> s = struct.pack('>II', a, b)
>>> ''.join([format(x, '02x') for x in s])
To do this at full speed for a large input, use struct.iter_unpack
I have raw data from a camera, which is in the mono12packed format. This is an interlaced bit format, to store 2 12bit integers in 3 bytes to eliminate overhead. Explicitly the memory layout for each 3 bytes looks like this:
Byte 1 = Pixel0 Bits 11-4
Byte 2 = Pixel1 Bits 3-0 + Pixel0 Bits 3-0
Byte 3 = Pixel1 Bits 11-4
I have a file, where all the bytes can be read from using binary read, let's assume it is called binfile.
To get the pixeldata from the file I do:
from bitstring import BitArray as Bit
f = open(binfile, 'rb')
bytestring = f.read()
a = []
for i in range(len(bytestring)/3): #reading 2 pixels = 3 bytes at a time
s = Bit(bytes = bytestring[i*3:i*3+3], length = 24)
p0 = s[0:8]+s[12:16]
p1 = s[16:]+s[8:12]
which works, but is horribly slow and I would like to do that more efficiently, because I have to do that for a huge amount of data.
My idea is, that by reading more than 3 bytes at a time I could spare some time in the conversion step, but I can't figure a way how to do that.
Another idea is, since the bits come in packs of 4, maybe there is a way to work on nibbles rather than on bits.
Data example:
The bytes
lead to the data
[117, 120, 93, 105]
Have you tried bitwise operators? Maybe that's a faster way:
with open('binfile.txt', 'rb') as binfile:
bytestring = list(bytearray(binfile.read()))
a = []
for i in range(0, len(bytestring), 3):
px_bytes = bytestring[i:i+3]
p0 = (px_bytes[0] << 4) | (px_bytes[1] & 0x0F)
p1 = (px_bytes[2] << 4) | (px_bytes[1] >> 4 & 0x0F)
print a
This also outputs:
[117, 120, 93, 105]
Hope it helps!
I'm still pretty new to python. I've been able to write a program that will read in a file from binary and stores the data that's there in a few arrays. Now that I've been able to complete a few other tasks with this program, I'm trying to go back through all my code and see where I can make it more efficient, learning Python better along the way. In particular, I'm trying to update the reading and storing of data from the file. Using numpy's fromfile is MUCH, MUCH faster at unpacking data than the struct.unpack method, and works wonderfully for a 1D array structure. However, I have some of the data stored in 2D arrays. I am seemingly stuck on how to implement the same type of storing in the 2D array. Does anyone have any ideas or hints as to how I may be able to perform this?
My basic program structure is as follows:
from numpy import fromfile
import numpy as np
file = open(theFilePath,'rb')
####### File Header #########
reservedParse = 4
fileHeaderBytes = 4 + int(131072/reservedParse) #Parsing out the bins in the file header
fileHeaderArray = np.zeros(fileHeaderBytes)
fileHeaderArray[0] = fromfile(file, dtype='<I', count=1) #File Index; 4 Bytes
fileHeaderArray[1] = fromfile(file, dtype='<I', count=1) #The Number of Packets; 4 bytes
fileHeaderArray[2] = fromfile(file, dtype='<Q', count=1) #Timestamp; 16 bytes; 2, 8-byte.
fileHeaderArray[3] = fromfile(file, dtype='<Q', count=1)
fileHeaderArray[4:] = fromfile(file, dtype='<I', count=int(131072/reservedParse)) #Empty header space
####### Data Packets #########
#Note: Each packet begins with a header containing information about the data stream followed by the data.
packets = int(fileHeaderArray[1]) #The number of packets in the data stream
dataLength = int(28672)
packHeader = np.zeros(14*packets).reshape((packets,14))
data = np.zeros(dataLength*packets).reshape((packets,dataLength))
for i in range(packets):
packHeader[i][0] = fromfile(file, dtype='>H', count=1) #Model Num
packHeader[i][1] = fromfile(file, dtype='>H', count=1) #Packet ID
packHeader[i][2] = fromfile(file, dtype='>I', count=1) #Coarse Timestamp
....#Continuing on
packHeader[i][13] = fromfile(file, dtype='>i', count=1) #4 bytes of reserved space
data[i] = fromfile(file, dtype='<h', count=dataLength) #Actual data
Essentially this is what I have right now. Is there a way I can do this without doing the loop? Going through that loop does not seem particularly fast or numpy-ish.
For reference, the for-loop structure using unpack and not numpy is:
packHeader = [[0 for x in range(14)] for y in range(packets)]
data = [[0 for x in range(dataLength)] for y in range(packets)]
for i in range(packets):
packHeader[i][0] = unpack('>H', file.read(2)) #Model Num
packHeader[i][1] = unpack('>H', file.read(2)) #Packet ID
packHeader[i][2] = unpack('>I', file.read(4)) #Coarse Timestamp
....#Continuing on
packHeader[i][13] = unpack('>i', file.read(4)) #4 bytes of reserved space
packHeader[i]=list(chain(*packHeader[i])) #Deals with the tuple issue ((x,),(y,),...) -> (x,y,...)
data[i] = [unpack('<h', file.read(2)) for j in range(dataLength)] #Actual data
data[i] = list(chain(*data[i])) #Deals with the tuple issue ((x,),(y,),...) -> (x,y,...)
Let me know if any clarification is needed.