How to properly write and read BitArray objects? - python

I have an issue. I tryed to save my BitArray object into file. After that I want to read it and get the same BitArray object what I saved earlier. But result is not same with input.
from bitarray import bitarray
a = bitarray()
a += bitarray('{0:014b}'.format(15))
print(a.to01(), len(a))
with open('j.j', 'wb') as file:
a.tofile(file)
b = bitarray()
with open('j.j', 'rb') as file:
b.fromfile(file)
print(b.to01(), len(b))
Output:
00000000001111 14
0000000000111100 16
I see my object now is 2-byte representation. But I want to get 14-bit I saved. Do you have any ideas to make it right?

This isn't a great solution, but it does get rid of the 0's on the right.
from bitarray import bitarray
a = bitarray('{0:014b}'.format(15))
print(a.to01(), len(a)) #00000000001111 14
with open('j.j', 'wb') as file:
a.reverse()
a.tofile(file)
b = bitarray()
with open('j.j', 'rb') as file:
b.fromfile(file)
b.reverse()
print(b.to01(), len(b)) #0000000000001111 16
You could skip the reversals and just right shift b, but you would have to create a dynamic system that knows exactly how many bits to shift by. Another solution is to simply use bits in multiples of 8 in the first place. What are you saving here by removing 1 to 7 bits? You aren't saving anything in the file. Those bits will be padded regardless.

It's eather not a great solution, but it's a solution)
def encode():
encoded_bits = bitarray()
...
encoded_bits += bitarray('000') # 48:51 slice
zeroes_at_the_end = 8 - len(encoded_bits) % 8
if zeroes_at_the_end < 8:
encoded_bits[48:51] = bitarray('{0:03b}'.format(zeroes_at_the_end))
def decode(bites_sequence):
zeroes_at_the_end = ba2int(bites_sequence[48:51])
if zeroes_at_the_end != 0:
del bites_sequence[-zeroes_at_the_end:]
I just contain number of zeroes, which will appear after save/read in files and then easy delete those zeroes

Related

load hex file with intelhex format in python

I saved one data set(200 double data values) from Keil, it turns to be a .hex file with IntelHex format, I installed IntelHex in python and load it. Now the problem is I do not know how to interpret it, for example, this post
tells you to use dict, but it does not work for hex file including double data. my code:
from intelhex import IntelHex
ih = IntelHex() # create empty object
ih.loadhex('output.hex')
ihdict = ih.todict()
datastr = ""
startAddress = 536871952
while ihdict.get(startAddress) != None:
datastr += str("%0.2X" %ihdict.get(startAddress))
startAddress += 1
the file output.hex:
:020000042000DA
:0802A8003FB7809F5BC03F409F
:1002B000DFB56EEF5AB73F407F717CBF38BE3F401D
:1002C000DFD369EFE9B43F407F717CBF38BE3F4068
:1002D0003F895E9F44AF3F401F706A0F38B53F4073
:1002E0009F20584F10AC3F405F5F72AF2FB93F4027
:1002F000DFB56EEF5AB73F40DF5B7DEFADBE3F40ED
:10030000BFA364DF51B23F40DF62676FB1B33F40CC
:100310001F9E8C0F4FC63F405F0C6B2F86B53F4032
:10032000BF7542DF3AA13F403F4D689F26B43F4032
:100330009F2742CF13A13F40DF2D5BEF96AD3F409B
:100340009F915ACF48AD3F40DF874CEF43A63F40D7
:100350007FD2573FE9AB3F40FD721E7F398F3F4050
:10036000FF892FFFC4973F409D5311CFA9883F407D
:100370001F706A0F38B53F407F78663F3CB33F40FF
:100380001DFD148F7E8A3F401F954F8FCAA73F40A7
:100390005FC04D2FE0A63F401F0D3C8F069E3F40A3
:1003A0007F4A443F25A23F40DFE13DEFF09E3F40C2
:1003B0003F185C1F0CAE3F403F79379FBC9B3F40CE
:1003C000FF2F3EFF179F3F40DFBC586F5EAC3F40A2
:1003D000FD36287F1B943F403F3D419F9EA03F40FC
:1003E000FFFA317FFD983F409FF2354FF99A3F4029
:1003F0007D0511BF82883F40DF703B6FB89D3F4055
:10040000FF1143FF88A13F40DD60146F308A3F40F9
:100410001F49328F24993F407D230CBF11863F40F6
:100420009DBD29CFDE943F40BFED2EDF76973F4044
:10043000DDBA056FDD823F407D58183F2C8C3F4070
:100440007F3333BF99993F40DD9C0A6F4E853F4013
:100450007F3333BF99993F403DBC171FDE8B3F4030
:10046000BDF4185F7A8C3F403D16091F8B843F40D6
:100470003DE1FC9E707E3F40DD0D0DEF86863F40E6
:100480003F1F469F0FA33F403DFFF79EFF7B3F402E
:10049000DD42196FA18C3F40BDC6F65E637B3F40D5
:1004A0009D5AFB4EAD7D3F40FD4BE6FE25733F4020
:1004B0001D12D30E89693F403D8EF51EC77A3F401D
:1004C0003DBC171FDE8B3F409D7FE0CE3F703F401D
:1004D000BD6C055FB6823F40DD4903EFA4813F401C
:1004E000DD14F76E8A7B3F40DDF6FB6EFB7D3F40FF
:1004F000BD20E85E10743F40DD6EE86E37743F400B
:100500001DB1F78ED87B3F407DFCD33EFE693F4056
:100510009D4C274FA6933F407DD7EEBE6B773F4063
:10052000DDAADE6E556F3F401D55B38EAA593F4080
:100530009DF7CCCE7B663F40DDCFC3EEE7613F4009
:10054000BDF2C55EF9623F40BD7AD95EBD6C3F40E9
:100550005D90D82E486C3F40BD7AD95EBD6C3F405F
:100560003D67BD9EB35E3F403D0DCC9E06663F405D
:10057000BD88AD5EC4563F40BDA6A85E53543F4003
:10058000BDD4CA5E6A653F40FD95B0FE4A583F4003
:100590003DA3B39ED1593F405DF89D2EFC4E3F4098
:1005A000FD69E1FEB4703F40FD59BAFE2C5D3F404D
:1005B000BD63C8DE31643F407DB0B63E585B3F400E
:1005C0001DCD9F8EE64F3F405DEAC92EF5643F404A
:1005D000FD4993FEA4493F405D8E852EC7423F40B2
:1005E0007D4D88BE26443F401D3EA20E1F513F4018
:1005F0003D938C9E49463F40FD7E9F7EBF4F3F40CE
:10060000DDB1C8EE58643F40BD7F70DE3F383F40EB
:100610005DBCA72EDE533F409D4197CEA04B3F408F
:100620003D0B799E853C3F403D0B799E853C3F408C
:10063000BD7F70DE3F383F407D6B83BEB5413F409C
:10064000FD32827E19413F409D2A864E15433F4030
:10065000FDEFA1FEF7503F401DC4620E62313F40E6
:100660003D476F9EA3373F401D98930ECC493F40B6
:100670001D53608E29303F403DF4671EFA333F40E2
:100680003D048F1E82473F407D726D3EB9363F402C
:10069000FDF68B7EFB453F40FD5767FEAB333F4089
:1006A0005D7774AE3B3A3F40BD2C695E96343F4067
:1006B0009DF579CEFA3C3F40DD4C476EA6233F4086
:1006C0001D1E540E0F2A3F407DE36FBEF1373F40A1
:1006D000FD7C4C7E3E263F40DD8153EEC0293F40ED
:1006E0009D034ECE01273F409D1375CE893A3F4072
:1006F000DDC4336EE2193F407D444B3EA2253F40AE
:100700005DD84F2EEC273F407D0F3FBE871F3F40F7
:100710007DF143BEF8213F40BDE04B5EF0253F40F8
:100720005D8548AE42243F405DD84F2EEC273F40C8
:100730007D5B5CBE2D2E3F40FD40567E202B3F4012
:10074000BD514EDE28273F40FD2945FE94223F4003
:100750001DD9208E6C103F409DE552CE72293F403E
:100760003D91399EC81C3F407D16293E8B143F4069
:100770007D6930BE34183F409D9935CECC1A3F403C
:100780005DFD34AE7E1A3F409DB730CE5B183F40D2
:100790003DAF349E571A3F405D8C322E46193F4084
:1007A0003D81129E40093F405DC8282E64143F40A1
:1007B0007D34243E1A123F405DC13EAE601F3F4073
:1007C0003D2E0B1E97053F405D6E372EB71B3F40F9
:1007D0003D09269E04133F403DCD2F9EE6173F4026
:1007E000BD6F49DEB7243F403D5C2D1EAE163F4035
:1007F0001DE00A0E70053F403DBD089E5E043F406F
:100800009D890ECE44073F401D86190EC30C3F4004
:10081000BDD70EDE6B073F401DBB258EDD123F406E
:100820001DBB258EDD123F407D06023E03013F4089
:10083000BDD0245E68123F40FDA81B7ED40D3F4012
:100840005D14462E0A233F40FD12347E091A3F40B4
:100850009D180C4E0C063F401D681E0E340F3F4085
:100860001D510D8EA8063F403DD4191EEA0C3F4095
:100870001B94ED0DCAF63E405D40152EA00A3F4088
:100880009D64294EB2143F407DF82D3EFC163F403A
:100890001DD9208E6C103F405DE6232EF3113F40A2
:1008A0007DA526BE52133F40BDB913DEDC093F4093
:1008B0005D2904AE14023F40BBE5E2DD72F13E402B
:1008C0009D180C4E0C063F40BD84075EC2033F409E
:1008D0005D221A2E110D3F40FDC6167E630B3F4070
:0108E0009D7A
:00000001FF
Assuming the data represents a list of 64-bit floating point numbers that you want to decode, the process is to collect the appropriate number of octets and decode them as a double.
Reusing the structure you presented:
from intelhex import IntelHex
import struct
ih = IntelHex()
ih.loadhex('output.hex')
ihdict = ih.todict()
# Read all the data into a long list of int octets
data = []
startAddress = 536871952
while ihdict.get(startAddress) is not None:
data.append(ihdict.get(startAddress))
startAddress += 1
# slice the list into 8-byte bytearrays
bin_arr = [bytearray(data[n:n+8]) for n in range(0, len(data), 8)]
# unpack each bytearray as a double
# Filter for 8 byte arrays because len(data) is not divisible by 8.
# Is the data properly aligned?
doubles_list = [struct.unpack('d', b) for b in bin_arr if len(b) == 8]
print(doubles_list)
It may be worth mentioning that the above assumes a big endian byte ordering. I believe you can use < as part of the format definition to assume a little endian ordering. More information is available in the struct.unpack docs.

How to get multiple 32bit values with byte array?

I need to extract some number values out of a binary data stream.
the code below is working for me, but for sure there is a more suitable way to do this in python. Especially I was struggling a lot to find a better way to iterate over the array and get 4 byte as byte arrays from the buffer.
some hint for me?
outfile = io.BytesIO()
outfile.writelines(some binary data stream)
buf = outfile.getvalue()
blen = int(len(buf) / 4 );
for i in range(blen):
a = bytearray([0,0,0,0])
a[0] = buf[i*4]
a[1] = buf[i*4+1]
a[2] = buf[i*4+2]
a[3] = buf[i*4+3]
data = struct.unpack('<l', a)[0]
do something with data
Your question and accompanying pseudo-code are somewhat hazy in my opinion, but here's something that uses slices of buf to obtain the each group of 4 bytes needed—so if nothing else it's at least a bit more succinct (assuming I've correctly interpreted what you're asking):
import io
import struct
outfile = io.BytesIO()
outfile.writelines([b'\x00\x01\x02\x03',
b'\x04\x05\x06\x07'])
buf = outfile.getvalue()
for i in range(0, len(buf), 4):
data = struct.unpack('<l', buf[i:i+4])[0]
print(hex(data))
Output:
0x3020100
0x7060504

Python - Efficient way to flip bytes in a file?

I've got a folder full of very large files that need to be byte flipped by a power of 4. So essentially, I need to read the files as a binary, adjust the sequence of bits, and then write a new binary file with the bits adjusted.
In essence, what I'm trying to do is read a hex string hexString that looks like this:
"00112233AABBCCDD"
And write a file that looks like this:
"33221100DDCCBBAA"
(i.e. every two characters is a byte, and I need to flip the bytes by a power of 4)
I am very new to python and coding in general, and the way I am currently accomplishing this task is extremely inefficient. My code currently looks like this:
import binascii
with open(myFile, 'rb') as f:
content = f.read()
hexString = str(binascii.hexlify(content))
flippedBytes = ""
inc = 0
while inc < len(hexString):
flippedBytes += file[inc + 6:inc + 8]
flippedBytes += file[inc + 4:inc + 6]
flippedBytes += file[inc + 2:inc + 4]
flippedBytes += file[inc:inc + 2]
inc += 8
..... write the flippedBytes to file, etc
The code I pasted above accurately accomplishes what I need (note, my actual code has a few extra lines of: "hexString.replace()" to remove unnecessary hex characters - but I've left those out to make the above easier to read). My ultimate problem is that it takes EXTREMELY long to run my code with larger files. Some of my files I need to flip are almost 2gb in size, and the code was going to take almost half a day to complete one single file. I've got dozens of files I need to run this on, so that timeframe simply isn't practical.
Is there a more efficient way to flip the HEX values in a file by a power of 4?
.... for what it's worth, there is a tool called WinHEX that can do this manually, and only takes a minute max to flip the whole file.... I was just hoping to automate this with python so we didn't have to manually use WinHEX each time
You want to convert your 4-byte integers from little-endian to big-endian, or vice-versa. You can use the struct module for that:
import struct
with open(myfile, 'rb') as infile, open(myoutput, 'wb') as of:
while True:
d = infile.read(4)
if not d:
break
le = struct.unpack('<I', d)
be = struct.pack('>I', *le)
of.write(be)
Here is a little struct awesomeness to get you started:
>>> import struct
>>> s = b'\x00\x11\x22\x33\xAA\xBB\xCC\xDD'
>>> a, b = struct.unpack('<II', s)
>>> s = struct.pack('>II', a, b)
>>> ''.join([format(x, '02x') for x in s])
'33221100ddccbbaa'
To do this at full speed for a large input, use struct.iter_unpack

byte reverse AB CD to CD AB with python

I have a .bin file, and I want to simply byte reverse the hex data. Say for instance # 0x10 it reads AD DE DE C0, want it to read DE AD C0 DE.
I know there is a simple way to do this, but I am am beginner and just learning python and am trying to make a few simple programs to help me through my daily tasks. I would like to convert the whole file this way, not just 0x10.
I will be converting at start offset 0x000000 and blocksize/length is 1000000.
here is my code, maybe you can tell me what to do. i am sure i am just not getting it, and i am new to programming and python. if you could help me i would very much appreciate it.
def main():
infile = open("file.bin", "rb")
new_pos = int("0x000000", 16)
chunk = int("1000000", 16)
data = infile.read(chunk)
reverse(data)
def reverse(data):
output(data)
def output(data):
with open("reversed", "wb") as outfile:
outfile.write(data)
main()
and you can see the module for reversing, i have tried many different suggestions and it will either pass the file through untouched, or it will throw errors. i know module reverse is empty now, but i have tried all kinds of things. i just need module reverse to convert AB CD to CD AB.
thanks for any input
EDIT: the file is 16 MB and i want to reverse the byte order of the whole file.
In Python 3.4 you can use this:
>>> data = b'\xAD\xDE\xDE\xC0'
>>> swap_data = bytearray(data)
>>> swap_data.reverse()
the result is
bytearray(b'\xc0\xde\xde\xad')
In Python 2, the binary file gets read as a string, so string slicing should easily handle the swapping of adjacent bytes:
>>> original = '\xAD\xDE\xDE\xC0'
>>> ''.join([c for t in zip(original[1::2], original[::2]) for c in t])
'\xde\xad\xc0\xde'
In Python 3, the binary file gets read as bytes. Only a small modification is need to build another array of bytes:
>>> original = b'\xAD\xDE\xDE\xC0'
>>> bytes([c for t in zip(original[1::2], original[::2]) for c in t])
b'\xde\xad\xc0\xde'
You could also use the < and > endianess format codes in the struct module to achieve the same result:
>>> struct.pack('<2h', *struct.unpack('>2h', original))
'\xde\xad\xc0\xde'
Happy byte swapping :-)
data = b'\xAD\xDE\xDE\xC0'
reversed_data = data[::-1]
print(reversed_data)
# b'\xc0\xde\xde\xad'
Python3
bytes(reversed(b'\xAD\xDE\xDE\xC0'))
# b'\xc0\xde\xde\xad'
Python has a list operator to reverse the values of a list --> nameOfList[::-1]
So, I might store the hex values as string and put them into a list then try something like:
def reverseList(aList):
rev = aList[::-1]
outString = ""
for el in rev:
outString += el + " "
return outString

reorder byte order in hex string (python)

I want to build a small formatter in python giving me back the numeric
values embedded in lines of hex strings.
It is a central part of my formatter and should be reasonable fast to
format more than 100 lines/sec (each line about ~100 chars).
The code below should give an example where I'm currently blocked.
'data_string_in_orig' shows the given input format. It has to be
byte swapped for each word. The swap from 'data_string_in_orig' to
'data_string_in_swapped' is needed. In the end I need the structure
access as shown. The expected result is within the comment.
Thanks in advance
Wolfgang R
#!/usr/bin/python
import binascii
import struct
## 'uint32 double'
data_string_in_orig = 'b62e000052e366667a66408d'
data_string_in_swapped = '2eb60000e3526666667a8d40'
print data_string_in_orig
packed_data = binascii.unhexlify(data_string_in_swapped)
s = struct.Struct('<Id')
unpacked_data = s.unpack_from(packed_data, 0)
print 'Unpacked Values:', unpacked_data
## Unpacked Values: (46638, 943.29999999943209)
exit(0)
array.arrays have a byteswap method:
import binascii
import struct
import array
x = binascii.unhexlify('b62e000052e366667a66408d')
y = array.array('h', x)
y.byteswap()
s = struct.Struct('<Id')
print(s.unpack_from(y))
# (46638, 943.2999999994321)
The h in array.array('h', x) was chosen because it tells array.array to regard the data in x as an array of 2-byte shorts. The important thing is that each item be regarded as being 2-bytes long. H, which signifies 2-byte unsigned short, works just as well.
This should do exactly what unutbu's version does, but might be slightly easier to follow for some...
from binascii import unhexlify
from struct import pack, unpack
orig = unhexlify('b62e000052e366667a66408d')
swapped = pack('<6h', *unpack('>6h', orig))
print unpack('<Id', swapped)
# (46638, 943.2999999994321)
Basically, unpack 6 shorts big-endian, repack as 6 shorts little-endian.
Again, same thing that unutbu's code does, and you should use his.
edit Just realized I get to use my favorite Python idiom for this... Don't do this either:
orig = 'b62e000052e366667a66408d'
swap =''.join(sum([(c,d,a,b) for a,b,c,d in zip(*[iter(orig)]*4)], ()))
# '2eb60000e3526666667a8d40'
The swap from 'data_string_in_orig' to 'data_string_in_swapped' may also be done with comprehensions without using any imports:
>>> d = 'b62e000052e366667a66408d'
>>> "".join([m[2:4]+m[0:2] for m in [d[i:i+4] for i in range(0,len(d),4)]])
'2eb60000e3526666667a8d40'
The comprehension works for swapping byte order in hex strings representing 16-bit words. Modifying it for a different word-length is trivial. We can make a general hex digit order swap function also:
def swap_order(d, wsz=4, gsz=2 ):
return "".join(["".join([m[i:i+gsz] for i in range(wsz-gsz,-gsz,-gsz)]) for m in [d[i:i+wsz] for i in range(0,len(d),wsz)]])
The input params are:
d : the input hex string
wsz: the word-size in nibbles (e.g for 16-bit words wsz=4, for 32-bit words wsz=8)
gsz: the number of nibbles which stay together (e.g for reordering bytes gsz=2, for reordering 16-bit words gsz = 4)
import binascii, tkinter, array
from tkinter import *
infile_read = filedialog.askopenfilename()
with open(infile, 'rb') as infile_:
infile_read = infile_.read()
x = (infile_read)
y = array.array('l', x)
y.byteswap()
swapped = (binascii.hexlify(y))
This is a 32 bit unsigned short swap i achieved with code very much the same as "unutbu's" answer just a little bit easier to understand. And technically binascii is not needed for the swap. Only array.byteswap is needed.

Categories