I am getting trouble working with binary file in python.
Here is what I want to do:
I have a binary file in which I want to modify a sequence by another.
The sequence to replace is 'ASBF' .
And I want to replace it by a number.
It worked just fine when I used python 2.7.
But now in python 3.3 there is a difference between bytes and str and I think it is my problem.
Here was the code I was doing:
#that is the number I want to put instead of the hexa sequence
number = 1703518678
#I put it in an array
number_array = []
number_array.append(number & 0xFF)
number_array.append(number >> 8 & 0xFF)
number_array.append(number >> 16 & 0xFF)
number_array.append(number >> 24 & 0xFF)
f = open(fichier_bin, 'rb')
lines = f.readlines()
f.close()
f = open(fichier_bin, 'wb')
for line in lines:
f.write(line.replace('ASBF', struct.pack('BBBB', number_array[0], number_array[1], number_array[2], number_array[3]))) #replace ASBF by the number
f.close()
I tried other things to workaround the problem but I can't figure out how to replace a sequence by another in a binary file.
I would like 41534246 which is 'ASBF' in hexa to become 6589A1D6 which is 1703518678 in hexa.
EDIT:
And here is the error I get
f.write(ligne.replace('ASBF', struct.pack('BBBB', number_array[0], number_array[1], number_array[2], number_array[3]))) #replace ASBF by the number
TypeError: expected bytes, bytearray or buffer compatible object
And I really do not understand how to get through this.
EDIT2:
The problem I got was from the way I opened the file.
Now that I use with instead of just open, my program is working just fine.
Related
I have a file which contains binary data. The content of this file is just one long line.
Example: 010101000011101010101
Originaly the content was an array of c++ objects with the following data types:
// Care pseudo code, just for visualisation
int64 var1;
int32 var2[50];
int08 var3;
I want to skip var1 and var3 and only extract the values of var2 into some readable decimal values. My idea was to read the file byte by byte and convert them into hex values. In the next step I though I could "combine" (append) 4 of those hex values to get one int32 value.
Example: 0x10 0xAA 0x00 0x50 -> 0x10AA0050
My code so far:
def append_hex(a, b):
return (a << 4) | b
with open("file.dat", "rb") as f:
counter = 0
tickdifcounter = 0
current_byte=" "
while True:
if (counter >= 8) and (counter < 208):
tickdifcounter+=1
if (tickdifcounter <= 4):
current_byte = append_hex(current_byte, f.read(1))
if (not current_byte):
break
val = ord(current_byte)
if (tickdifcounter > 4):
print hex(val)
tickdifcounter = 0
current_byte=""
counter+=1
if(counter == 209): #209 bytes = int64 + (int32*50) + int08
counter = 0
print
Now I have the problem that my append_hex is not working because the variables are strings so the bitshift is not working.
I am new to python so please give me hints when I can do something in a better way.
You can use struct module for reading binary files.
This can help you Reading a binary file into a struct in Python
A character can be converted to a int using the ord(x) method. In order to get the integer value of a multi-byte number, bitshift left. For example, from a earlier project:
def parseNumber(string, index):
return ord(string[index])<<24 + ord(string[index+1])<<16 + \
ord(string[index+2])<<8+ord(string[index+3])
Note this code assumes big-endian system, you will need to reverse the index for parsing little-endian code.
If you know exaclty what the size of the struct is going to be, (or can easily calculate it based on the size of the file) you are probably better of using the "struct" module.
I've got a folder full of very large files that need to be byte flipped by a power of 4. So essentially, I need to read the files as a binary, adjust the sequence of bits, and then write a new binary file with the bits adjusted.
In essence, what I'm trying to do is read a hex string hexString that looks like this:
"00112233AABBCCDD"
And write a file that looks like this:
"33221100DDCCBBAA"
(i.e. every two characters is a byte, and I need to flip the bytes by a power of 4)
I am very new to python and coding in general, and the way I am currently accomplishing this task is extremely inefficient. My code currently looks like this:
import binascii
with open(myFile, 'rb') as f:
content = f.read()
hexString = str(binascii.hexlify(content))
flippedBytes = ""
inc = 0
while inc < len(hexString):
flippedBytes += file[inc + 6:inc + 8]
flippedBytes += file[inc + 4:inc + 6]
flippedBytes += file[inc + 2:inc + 4]
flippedBytes += file[inc:inc + 2]
inc += 8
..... write the flippedBytes to file, etc
The code I pasted above accurately accomplishes what I need (note, my actual code has a few extra lines of: "hexString.replace()" to remove unnecessary hex characters - but I've left those out to make the above easier to read). My ultimate problem is that it takes EXTREMELY long to run my code with larger files. Some of my files I need to flip are almost 2gb in size, and the code was going to take almost half a day to complete one single file. I've got dozens of files I need to run this on, so that timeframe simply isn't practical.
Is there a more efficient way to flip the HEX values in a file by a power of 4?
.... for what it's worth, there is a tool called WinHEX that can do this manually, and only takes a minute max to flip the whole file.... I was just hoping to automate this with python so we didn't have to manually use WinHEX each time
You want to convert your 4-byte integers from little-endian to big-endian, or vice-versa. You can use the struct module for that:
import struct
with open(myfile, 'rb') as infile, open(myoutput, 'wb') as of:
while True:
d = infile.read(4)
if not d:
break
le = struct.unpack('<I', d)
be = struct.pack('>I', *le)
of.write(be)
Here is a little struct awesomeness to get you started:
>>> import struct
>>> s = b'\x00\x11\x22\x33\xAA\xBB\xCC\xDD'
>>> a, b = struct.unpack('<II', s)
>>> s = struct.pack('>II', a, b)
>>> ''.join([format(x, '02x') for x in s])
'33221100ddccbbaa'
To do this at full speed for a large input, use struct.iter_unpack
I have a bitstream file that is comprised of several lines. The c program executable that produces the file outputs the file as a series of "short int" which are actually 16 bit integers. When I open the file in notepad I get the following first few lines:
7E1755EB7909DAC8FF4117BDAA0E86EBD1A8
1C3D47DD6606D812E8862D347288C3A251EB
16D7D02AD908E0083C142C107AB916C55BE0
I need to be able to open this file in Python and convert it to 1s and 0s that represent the original "short int" or in other words an array of 1s and 0s. I think I might also have an issue with the "\n" meaning new line when I read in the file.
I've tried the following code to see which method works best:
import struct
filePathC = "C:\\Working\\Vocoder Sims\\ofile.chan"
fileC = open(filePathC, "rb")
with fileC:
byteC = fileC.read(8)
binaryC1 = bin(int(byteC,16))
binaryC2 = struct.unpack("h" * (len(byteC)/2),byteC)
print binaryC1
print binaryC2
The result for when I only read in the first 8 bytes is:
0b1111110000101110101010111101011
(17719, 14129, 13621, 16965)
The issue with the first result is that I should be getting 64 1s and 0s and the problem with the second is that it "tuple" instead of an array of 1s and 0s and I don't believe the integers are 16 bit based. They look more like 15 bit but I'm not sure.
Thanks in advance for any help.
Assuming you have binary data that just happens to decode as ascii hex, you could read the file into a python array. Its not much different than your second example of unpacking into a tuple except that its faster and has a lower memory footprint. Depending on what you want to do next, it would be reasonable to read into a numpy array instead.
import os
import array
#filePathC = "C:\\Working\\Vocoder Sims\\ofile.chan"
filePathC = "test.bin"
count = os.stat(filePathC).st_size / 2
with open(filePathC, 'rb') as fp:
binaryC3 = array.array("h")
binaryC3.fromfile(fp, count)
print binaryC3
print bin(binaryC3[0])
For your example file, this gave me
array('h', [17719, 14129, 13621, 16965, 14647, 14640, 16708, 14403, 17990, 12596, 14129, 17474, 16705, 17712, 13880, 16965, 12612, 14401, 12554, 13123, 13380, 17463, 13892, 12342, 17462, 12600, 17714, 14392, 12854, 13124, 14132, 14386, 17208, 16691, 13618, 17713, 2626, 13873, 14148, 12356, 16690, 14660, 14384, 12357, 14384, 17203, 13361, 17202, 12337, 16695, 14658, 13873, 13635, 16949, 12357])
0b100010100110111
I'm modifying an existing Python app that reads a binary file. The format of the file is changing a bit. Currently, one field is defined as bytes 35-36 of the record. The specs also state that "...fields in the records will be character fields written in ASCII." Here's what the current working code looks like:
def to_i16( word ):
xx = struct.unpack( '2c', word )
xx = ( ord( xx[ 0 ] ) << 8 ) + ord( xx[ 1 ] )
return xx
val = to_i16( reg[ 34:36 ] )
But that field is being redefined as a bytes 35-37, so it'll be a 24-bit value. I detest working with binary files and am horrible at bit-twiddling. How do I turn that 3-byte value into a 24-bit integer?? I've tried a couple of code bits that I've found by googling but I don't think they are correct. Hard to be sure since I'm still waiting on the people that sent the sample 'new format' file to send me a text representation that shows the values I should be coming up with.
Simply read 24 bit (I assume in big endian, since the original code is in that format as well ):
val = struct.unpack('>I', b'\x00' + reg[34:37])
I have a .bin file, and I want to simply byte reverse the hex data. Say for instance # 0x10 it reads AD DE DE C0, want it to read DE AD C0 DE.
I know there is a simple way to do this, but I am am beginner and just learning python and am trying to make a few simple programs to help me through my daily tasks. I would like to convert the whole file this way, not just 0x10.
I will be converting at start offset 0x000000 and blocksize/length is 1000000.
here is my code, maybe you can tell me what to do. i am sure i am just not getting it, and i am new to programming and python. if you could help me i would very much appreciate it.
def main():
infile = open("file.bin", "rb")
new_pos = int("0x000000", 16)
chunk = int("1000000", 16)
data = infile.read(chunk)
reverse(data)
def reverse(data):
output(data)
def output(data):
with open("reversed", "wb") as outfile:
outfile.write(data)
main()
and you can see the module for reversing, i have tried many different suggestions and it will either pass the file through untouched, or it will throw errors. i know module reverse is empty now, but i have tried all kinds of things. i just need module reverse to convert AB CD to CD AB.
thanks for any input
EDIT: the file is 16 MB and i want to reverse the byte order of the whole file.
In Python 3.4 you can use this:
>>> data = b'\xAD\xDE\xDE\xC0'
>>> swap_data = bytearray(data)
>>> swap_data.reverse()
the result is
bytearray(b'\xc0\xde\xde\xad')
In Python 2, the binary file gets read as a string, so string slicing should easily handle the swapping of adjacent bytes:
>>> original = '\xAD\xDE\xDE\xC0'
>>> ''.join([c for t in zip(original[1::2], original[::2]) for c in t])
'\xde\xad\xc0\xde'
In Python 3, the binary file gets read as bytes. Only a small modification is need to build another array of bytes:
>>> original = b'\xAD\xDE\xDE\xC0'
>>> bytes([c for t in zip(original[1::2], original[::2]) for c in t])
b'\xde\xad\xc0\xde'
You could also use the < and > endianess format codes in the struct module to achieve the same result:
>>> struct.pack('<2h', *struct.unpack('>2h', original))
'\xde\xad\xc0\xde'
Happy byte swapping :-)
data = b'\xAD\xDE\xDE\xC0'
reversed_data = data[::-1]
print(reversed_data)
# b'\xc0\xde\xde\xad'
Python3
bytes(reversed(b'\xAD\xDE\xDE\xC0'))
# b'\xc0\xde\xde\xad'
Python has a list operator to reverse the values of a list --> nameOfList[::-1]
So, I might store the hex values as string and put them into a list then try something like:
def reverseList(aList):
rev = aList[::-1]
outString = ""
for el in rev:
outString += el + " "
return outString