I am trying to read some negative values from a compressed file that has the hex values:
FFFFFFFF, which should be -1, but displays as 4294967295
FFFFFFFE, which should be -2, but displays as 4294967294
I know FF should be the marker for - but is there a method in python that can just read the values directly or do I have to make my own method?
Thank you!
Edit: This is for Python 2.6. My program reads from binary data and I am just displaying it in hex to make it simpler. The program simply reads 4 bytes at a time and grabs values from those 4 bytes. It is just some of those values are negative and display the above numbers. I am also hoping someone can explain how Python interprets the binary data into a value so I can write a reverse protocol. Thank you!
I read from hex and convert to values through this method.
def readtoint(read):
keynumber = read[::-1]
hexoffset=''
for letter in keynumber:
temp=hex(ord(letter))[2:]
if len(temp)==1:
temp="0"+temp
hexoffset += temp
value = int(hexoffset, 16)
return value
It grabs 4 bytes, inverses the order, then converts the hex value into a int value. THe values I posted above are inverted already.
Use the struct module:
import struct
def readtoint(read):
return struct.unpack('<i', read)[0]
Example:
>>> readtoint('\xfe\xff\xff\xff')
-2
Post you file reading code to get the perfect answer. But answer to your question is almost certainly here:
Reading integers from binary file in Python
Related
I would like to know how I can convert an integer into a binary number in a sequence of length = 32 bytes and least-significant-bit first encoding, in Python 3.
I need to do this conversion for an exercise in my cryptography class. I have tried the python function int.to_bytes() but it seems not working...
The integer I would like to convert is x = 999.
This is my code if it can help:
def reading_a_pdf_file (filename):
rfile = open(filename, 'rb')
contains = rfile.read();
rfile.close();
return contains
# ...
def writing_in_a_pdf_file (filename, txt):
wfile = open(filename, 'wb')
wfile.write(txt)
wfile.close()
# ...
import nacl.secret
import nacl.utils
x= 999
key = x.to_bytes(32, byteorder='little')
# creating the box of encryption/decryption
box = nacl.secret.SecretBox(key)
# reading the encrypted file
encrypted = reading_a_pdf_file('L12-14.enc.pdf')
# we decrypt the contain of the file
decrypted = box.decrypt(encrypted)
# finally we save into a new pdf file
writing_in_a_pdf_file('L12-14.pdf', decrypted)
print("The file have been successfully decrypted in 'L12-14.pdf'")
At the end of the program, I am supposed to get the file L12-14.pdf, but I get the error: "Decryption failed. Ciphertext failed verification" which means that my key is not the good one.
As I know the integer is right, I suppose I am making a mistake when I convert it.
Can you help me ?
so first of all: Welcome to Mr. Lutenberger's course, we're sharing a lecture here.
The issue is in fact the LSB-encoding of the binary number.
I won't outline the complete solution, in hope that you can solve this on your own. If it doesn't work ping me, i got it decrypted and can give you further hints.
So, you have 999 as a solution. Converted to binary, that is 1111100111.
Note however, that this is in MSB and has 10 bits (both is important later).
First thing to do: swap the number into LSB. This is essentially swapping the bits. NOTE: at this point, do NOT preprend or append 0's to fill up bytes!
Now that you have the number in LSB, you want to have it in reverse byte order in Python, as passing this directly would result in a bunch of 0's and the data at the end. You correctly used byteorder=little here. However, the number that we have here is 10 bit large, so it spans across TWO bytes. So for us to have the bytes AND bits in the correct order AND both in the beginning of our 32 byte stream we have to switch the two bytes involved as well, as the second byte (the "end" of the number) will be the first one after applying byteorder=little. For this step, the second byte has to be appended 6 0's to fill it up before swapping, in ordr to keep the bytes "seperate".
Now with your manipulated head of the byte stream, decode the value as int and pass that as value to your x. That should work. Hint: x has 5 digits now.
As a side note: may i ask how you calculated 999?
I have been working on a program and I have been trying to convert a big binary file (As a string) and pack it into a file. I have tried for days to make such thing possible. Here is the code I had written to pack the large binary string.
binaryRecieved="11001010101....(Shortened)"
f=open(fileName,'wb')
m=long(binaryRecieved,2)
struct.pack('i',m)
f.write(struct.pack('i',m))
f.close()
quit()
I am left with the error
struct.pack('i',x)
struct.error: integer out of range for 'i' format code
My integer is out of range, so I was wondering if there is a different way of going about with this.
Thanks
Convert your bit string to a byte string: see for example this question Converting bits to bytes in Python. Then pack the bytes with struct.pack('c', bytestring)
For encoding m in big-endian order (like "ten" being written as "10" in normal decimal use) use:
def as_big_endian_bytes(i):
out=bytearray()
while i:
out.append(i&0xff)
i=i>>8
out.reverse()
return out
For encoding m in little-endian order (like "ten" being written as "01" in normal decimal use) use:
def as_little_endian_bytes(i):
out=bytearray()
while i:
out.append(i&0xff)
i=i>>8
return out
both functions work on numbers - like you do in your question - so the returned bytearray may be shorter than expected (because for numbers leading zeroes do not matter).
For an exact representation of a binary-digit-string (which is only possible if its length is dividable by 8) you would have to do:
def as_bytes(s):
assert len(s)%8==0
out=bytearray()
for i in range(0,len(s)-8,8):
out.append(int(s[i:i+8],2))
return out
In struct.pack you have used 'i' which represents an integer number, which is limited. As your code states, you have a long output; thus, you may want to use 'd' in stead of 'i', to pack your data up as double. It should work.
See Python struct for more information.
I have a string that looks like this
'\x00\x03\x10B\x00\x0e12102 G1103543\x10T\x07\x21'
I have been able to match the data I want which is "12102 G1103543" with this.
re.findall('\x10\x42(.*)\x10\x54', data)
Which will output this
'\x00\x0e12102 G1103543'
The problem im having is that \x10\x54 is not always at the end of the data I want. However what I have noticed is that the first two hex digits correspond to how long the data length will be. I.E. \x00\x0e = 14 so the data length is 14char long.
Is there a better way to do this, like matching the first part then cutting the next 14 characters? I should also say that the length will vary as im looking to match several things.
Also is there a way to output the string in all hex so its easier for me to read when working in a python shell I.E. \x10B == \x10\x42
Thank You!
Edit: I managed to come up with this working solution.
newdata = re.findall('\x10\x42(.*)', data)
newdata[0][2:int(newdata[0][0:2].encode('hex'))]
Please, note that you have an structured binary file at your hands, and it is foolish to try to use regular expressions to extract data from it.
First of all the "hex data" you talk about is not "hex data" -it is just bytes
in your stream outside the ASCII range - therefore Python2 will display these characters as a \x10 and so on - but internally it is just a single byte with the value 16 (when viewed as decimal). The \x42you write corresponds to the ASCII letter B and that is why you see B in your representation.
So your best bet there would be to get the file specification, and read the data you want from there using the struct module and byte-string slicing.
If you can't have the file spec, so it is a reverse-engineering work to find out the fields of interest -just like you are already doing. But even then, you should write some code with the struct module to get your values, since field lenghts (and most likely offsets) are encoded in the byte stream itself.
In this example, your marker "\x10\x42" will rarely be a marker per se - it is most likely its position is indicated by other factors in the file (either a fixed place in the file definition, or by an offset earlier on the file.
But - if you are correctly using this as a marker, you could make use of regular expressions just to findout all offsets of the "\x10\x42" marker as you are doing, and them interpreting the following two bytes as the message length:
import struct, re
def get_data(data, sep=b"\x10B"):
results = []
for match in re.finditer(sep, data):
offset = match.start()
msglen = struct.unpack(">H", data[offset + 2: offset + 4])[0]
print(msglen)
results.append(data[offset + 4: offset + 4 + msglen])
return results
I usually perform things like this in C++, but I'm using python to write a quick script and I've run into a wall.
If I have a binary list (or whatever python stores the result of an "fread" in). I can access the individual bytes in it with: buffer[0], buffer[1], etc.
I need to change the bytes [8-11] to hold a new 32-bit file-size (read: there's already a filesize there, I need to update it). In C++ I would just get a pointer to the location and cast it to store the integer, but with python I suddenly realized I have no idea how to do something like this.
How can I update 4 bytes in my buffer at a specific location to hold the value of an integer in python?
EDIT
I'm going to add more because I can't seem to figure it out from the solutions (though I can see they're on the right track).
First of all, I'm on python 2.4 (and can't upgrade, big corporation servers) - so that apparently limits my options. Sorry for not mentioning that earlier, I wasn't aware it had so many less features.
Secondly, let's make this ultra-simple.
Lets say I have a binary file named 'myfile.binary' with the five-byte contents '4C53535353' in hex - this equates to the ascii representations for letters "L and 4xS" being alone in the file.
If I do:
f = open('myfile.binary', 'rb')
contents = f.read(5)
contents should (from Sven Marnach's answer) hold a five-byte immutable string.
Using Python 2.4 facilities only, how could I change the 4 S's held in 'contents' to an arbitrary integer value? I.e. give me a line of code that can make byte indices contents [1-4] contain the 32-bit integer 'myint' with value 12345678910.
What you need is this function:
struct.pack_into(fmt, buffer, offset, v1, v2, ...)
It's documented at http://docs.python.org/library/struct.html near the top.
Example code:
import struct
import ctypes
data=ctypes.create_string_buffer(10)
struct.pack_into(">i", data, 5, 0x12345678)
print list(data)
Similar posting: Python: How to pack different types of data into a string buffer using struct.pack_into
EDIT: Added a Python 2.4 compatible example:
import struct
f=open('myfile.binary', 'rb')
contents=f.read(5)
data=list(contents)
data[0:4]=struct.pack(">i", 0x12345678)
print data
Have a look at Struct module. You need pack function.
EDIT:
The code:
import struct
s = "LSSSS" # your string
s = s[0] + struct.pack('<I', 1234567891) # note "shorter" constant than in your example
print s
Output:
L╙☻ЦI
struct.pack should be available in Python2.4.
Your number "12345678910" cannot be packed into 4 bytes, I shortened it a bit.
The result of file.read() is a string in Python, and it is immutable. Depending on the context of the task you are trying to accomplish, there are different solutions to the problem.
One is using the array module: Read the file directly as an array of 32-bit integers. You can modify this array and write it back to the file.
with open("filename") as f:
f.seek(0, 2)
size = f.tell()
f.seek(0)
data = array.array("i")
assert data.itemsize == 4
data.fromfile(f, size // 4)
data[2] = new_value
# use data.tofile(g) to write the data back to a new file g
You could install the numpy module, which is often used for scientific computing.
read_data = numpy.fromfile(file=id, dtype=numpy.uint32)
Then access the data at the desired location and make your changes.
The following is just a demonstration for you to understand what really happens when the four bytes are converted into an integer.
Suppose you have a number: 15213
Decimal: 15213
Binary: 0011 1011 0110 1101
Hex: 3 B 6 D
On little-endian systems (i.e x86 machines), this number can be represented using a length-4 bytearray as: b"\x6d\x3b\x00\x00" or b"m;\x00\x00" when you print it on the screen, to convert the four bytes into an integer, we simply do a bit of base conversion, which in this case, is:
sum(n*(256**i) for i,n in enumerate(b"\x6d\x3b\x00\x00"))
This gives you the result: 15213
Similar to this question, I am trying to read in an ID3v2 tag header and am having trouble figuring out how to get individual bytes in python.
I first read all ten bytes into a string. I then want to parse out the individual pieces of information.
I can grab the two version number chars in the string, but then I have no idea how to take those two chars and get an integer out of them.
The struct package seems to be what I want, but I can't get it to work.
Here is my code so-far (I am very new to python btw...so take it easy on me):
def __init__(self, ten_byte_string):
self.whole_string = ten_byte_string
self.file_identifier = self.whole_string[:3]
self.major_version = struct.pack('x', self.whole_string[3:4]) #this
self.minor_version = struct.pack('x', self.whole_string[4:5]) # and this
self.flags = self.whole_string[5:6]
self.len = self.whole_string[6:10]
Printing out any value except is obviously crap because they are not formatted correctly.
If you have a string, with 2 bytes that you wish to interpret as a 16 bit integer, you can do so by:
>>> s = '\0\x02'
>>> struct.unpack('>H', s)
(2,)
Note that the > is for big-endian (the largest part of the integer comes first). This is the format id3 tags use.
For other sizes of integer, you use different format codes. eg. "i" for a signed 32 bit integer. See help(struct) for details.
You can also unpack several elements at once. eg for 2 unsigned shorts, followed by a signed 32 bit value:
>>> a,b,c = struct.unpack('>HHi', some_string)
Going by your code, you are looking for (in order):
a 3 char string
2 single byte values (major and minor version)
a 1 byte flags variable
a 32 bit length quantity
The format string for this would be:
ident, major, minor, flags, len = struct.unpack('>3sBBBI', ten_byte_string)
Why write your own? (Assuming you haven't checked out these other options.) There's a couple options out there for reading in ID3 tag info from MP3s in Python. Check out my answer over at this question.
I am trying to read in an ID3v2 tag header
FWIW, there's already a module for this.
I was going to recommend the struct package but then you said you had tried it. Try this:
self.major_version = struct.unpack('H', self.whole_string[3:5])
The pack() function convers Python data types to bits, and the unpack() function converts bits to Python data types.