Python Parsing MBR - python

I am just starting out with Python scripting and I am trying to write a program that will parse through a provided MBR but I'm not sure how to start.
I want to write a program that will parse a portion of the MBR's partition table. The first partition entry is located at the address 1BE. Print out the status byte (1 byte located at the starting address), the partition type (1 byte located at the address 1BE + 4) and the address of the first sector in the partition (1BE + 8).
Any help would be greatly appreciated!

Batteries included. Use the array or struct module.
Or else one of these (but here they're likely overkill):
https://github.com/digidotcom/python-suitcase
https://github.com/vstinner/hachoir3
https://github.com/Muterra/py_smartyparse

I know this is a very old question, but I came here looking for an answer and the only one here did not answer the question itself very well. I believe I have a proper understanding of the question and answer at this point; So, starting with the first part which is the status address. A status address is typically something like: 0x80, which is an active status flag, and is only a byte long. This can be found with the following lines:
import struct # This is where we get our bytearray() structure
mbr = bytearray() # We want each index of our array to be a byte
binary_file = open(file, 'rb')
mbr = binary_file.read(512) # The first 512 bytes are the first sector, which is the MBR
status_flag = mbr[0x1BE]
The status flag is only a single byte, and because we know it is located at the address 0x1BE we are able to simply pull that index from the MBR array (what we gathered when we read the file but broken into 1 byte chunks). Another way to read 0x1BE could be as the integer 446; so we are really looking at the byte stored in the index mbr[446] in the example above (Because we start with 0x Python knows to interpret it as a hex value, so 446 is 0x1BE).
Moving onto the second part, similarly to the first part, the partition type is a single byte stored at the address 0x1BE+4 or 0x1C2. So, to find this, much like with the status byte, we are able to simply do:
partition_type = mbr[0x1C2]
Because the partition type is also just a byte, and each index of our mbr array is a byte, we can simply pull the value at the address 0x1C2.
As for the last part, the address of the first sector is a 4-byte value that starts at the address: 0x1BE+8 or 0x1C6. Because it is bytes, we know that it ends at the address 0x1BE+12 or 0x1CA. So, to find this, we can do the following:
first_sector_addr = struct.unpack('<I', mbr[0x1C6:0x1CA])
'''
For the line above, we are using the unpack function also
included with the struct import. This function takes two
primary arguments: the byte order/size/alignment, and the
data to read (https://docs.python.org/3/library/struct.html).
We must read the data as little-endian and as an unsigned int
(https://thestarman.pcministry.com/asm/mbr/PartTables.htm).
'''
Once we have all of the variables collected (status_flag, partition_type, first_sector_addr) we can print each of them to the screen. I recommend printing the first two as hex values as these are what are used for identification. For example, if the partition type has the hex value 0x83 it is a Linux Native file system (https://thestarman.pcministry.com/asm/mbr/PartTypes.htm)
https://thestarman.pcministry.com/asm/mbr/PartTables.htm
https://en.wikipedia.org/wiki/Master_boot_record#Sector_layout
https://www.ijais.org/research/volume10/number8/sadi-2016-ijais-451541.pdf
(Last link will prompt for pdf download, but is a useful resource on MBR. I think that is why I had to post it as code rather than text)

Related

Protobuf-net unrecognized stream prefix

I'm trying to reverse engineer the Quasar RAT protobuf protocol structure.
Quasar is a Remote Administration Tool written in C# which is open source and can be found online here.
https://github.com/quasar/QuasarRAT
I've managed to reverse most of it and I can now connect to the Quasar server client from a python script. How ever one question remains open, it appears that every byte stream that is being sent from the client to the server begins with a 3 byte field which is not registered within the protobuf class within Quasar. This field seems to provide the length of the message not including the prefixed bytes. As can be seen within this block for an example a prefixed byte stream generated for an array of size 0x2d2, these are the prefixed bytes being appended to the message.
0x0A, 0xCF, 0x05
If how ever I decide to change the message fields before serializing the message, this byte stream would change except from the first 0x0A byte. It seems that if I keep appending bytes to the message fields the second byte grows and if I overflow the second byte(make it reach above 0xff) - it would increment the third byte and reset the second byte to 0x80. But the math wont make sense to me at all as this field should return the size of the array but doesn't under any sensible formula that I could compute. I know that protobuf-net can generate PreLengthPrefix bytes to prefix the message with the length of it but this is not the case here.
Any help would be appreciated.
The encoding rules are here: https://developers.google.com/protocol-buffers/docs/encoding
Basically, each field is a encoded as a field-header (aka "tag"), followed by a payload. The field-header is a "varint" (see the encoding guide), the value of which is an integer composed of a field number and the wire-type. The wire-type is the 3 least significant bits, and the field number is the rest (shifted by 3 bits). In the case of 0x0A (binary 1010), the wire type is 2 (binary 010), and the field number is 1.
How you treat the payload depends on the wire type. For wire type 2 (length prefixed), you should expect next:
a varint that is the length of the payload in bytes, then
that many bytes of the actual payload
Unfortunately protobuf is ambiguous without a schema, so knowing that you have length prefixed data doesn't tell you what the data is; a length prefixed payload could be:
a UTF-8 string
a raw BLOB (bytes)
a sub-message
a "packed" array of some primitive type (integers/floating point numbers/etc) - remembering that the length prefix is the number of bytes, not the number of elements; the elements are not even necessarily fixed size (they could themselves be varints
In many ways, the purpose of the wire type isn't to tell you how to interpret the data; it is to tell you how to skip (or just store verbatim) the field if it isn't one you know about. For example, somebody else is using V3 of the API and you have only updated your schema to V2; they send a V3 message to your V2 API; V3 has extra fields you don't care about - the deserializer needs to not break when it hits them, so the wire type tells it how to ignore the field (i.e. what the rules are for finding the next field). Otherwise, we could just use the schema information and not store the wire type in the payload at all (although it is also used as an optimization on repeated primitive data, via "packed" arrays - it is up to the serializer whether it encodes such as length-prefixed vs lots of field header/value pairs).

How convert an integer into a binary number in a sequence of length=32 bytes in python

I would like to know how I can convert an integer into a binary number in a sequence of length = 32 bytes and least-significant-bit first encoding, in Python 3.
I need to do this conversion for an exercise in my cryptography class. I have tried the python function int.to_bytes() but it seems not working...
The integer I would like to convert is x = 999.
This is my code if it can help:
def reading_a_pdf_file (filename):
rfile = open(filename, 'rb')
contains = rfile.read();
rfile.close();
return contains
# ...
def writing_in_a_pdf_file (filename, txt):
wfile = open(filename, 'wb')
wfile.write(txt)
wfile.close()
# ...
import nacl.secret
import nacl.utils
x= 999
key = x.to_bytes(32, byteorder='little')
# creating the box of encryption/decryption
box = nacl.secret.SecretBox(key)
# reading the encrypted file
encrypted = reading_a_pdf_file('L12-14.enc.pdf')
# we decrypt the contain of the file
decrypted = box.decrypt(encrypted)
# finally we save into a new pdf file
writing_in_a_pdf_file('L12-14.pdf', decrypted)
print("The file have been successfully decrypted in 'L12-14.pdf'")
At the end of the program, I am supposed to get the file L12-14.pdf, but I get the error: "Decryption failed. Ciphertext failed verification" which means that my key is not the good one.
As I know the integer is right, I suppose I am making a mistake when I convert it.
Can you help me ?
so first of all: Welcome to Mr. Lutenberger's course, we're sharing a lecture here.
The issue is in fact the LSB-encoding of the binary number.
I won't outline the complete solution, in hope that you can solve this on your own. If it doesn't work ping me, i got it decrypted and can give you further hints.
So, you have 999 as a solution. Converted to binary, that is 1111100111.
Note however, that this is in MSB and has 10 bits (both is important later).
First thing to do: swap the number into LSB. This is essentially swapping the bits. NOTE: at this point, do NOT preprend or append 0's to fill up bytes!
Now that you have the number in LSB, you want to have it in reverse byte order in Python, as passing this directly would result in a bunch of 0's and the data at the end. You correctly used byteorder=little here. However, the number that we have here is 10 bit large, so it spans across TWO bytes. So for us to have the bytes AND bits in the correct order AND both in the beginning of our 32 byte stream we have to switch the two bytes involved as well, as the second byte (the "end" of the number) will be the first one after applying byteorder=little. For this step, the second byte has to be appended 6 0's to fill it up before swapping, in ordr to keep the bytes "seperate".
Now with your manipulated head of the byte stream, decode the value as int and pass that as value to your x. That should work. Hint: x has 5 digits now.
As a side note: may i ask how you calculated 999?

How do I search for a set amount of hex and non hex data in python

I have a string that looks like this
'\x00\x03\x10B\x00\x0e12102 G1103543\x10T\x07\x21'
I have been able to match the data I want which is "12102 G1103543" with this.
re.findall('\x10\x42(.*)\x10\x54', data)
Which will output this
'\x00\x0e12102 G1103543'
The problem im having is that \x10\x54 is not always at the end of the data I want. However what I have noticed is that the first two hex digits correspond to how long the data length will be. I.E. \x00\x0e = 14 so the data length is 14char long.
Is there a better way to do this, like matching the first part then cutting the next 14 characters? I should also say that the length will vary as im looking to match several things.
Also is there a way to output the string in all hex so its easier for me to read when working in a python shell I.E. \x10B == \x10\x42
Thank You!
Edit: I managed to come up with this working solution.
newdata = re.findall('\x10\x42(.*)', data)
newdata[0][2:int(newdata[0][0:2].encode('hex'))]
Please, note that you have an structured binary file at your hands, and it is foolish to try to use regular expressions to extract data from it.
First of all the "hex data" you talk about is not "hex data" -it is just bytes
in your stream outside the ASCII range - therefore Python2 will display these characters as a \x10 and so on - but internally it is just a single byte with the value 16 (when viewed as decimal). The \x42you write corresponds to the ASCII letter B and that is why you see B in your representation.
So your best bet there would be to get the file specification, and read the data you want from there using the struct module and byte-string slicing.
If you can't have the file spec, so it is a reverse-engineering work to find out the fields of interest -just like you are already doing. But even then, you should write some code with the struct module to get your values, since field lenghts (and most likely offsets) are encoded in the byte stream itself.
In this example, your marker "\x10\x42" will rarely be a marker per se - it is most likely its position is indicated by other factors in the file (either a fixed place in the file definition, or by an offset earlier on the file.
But - if you are correctly using this as a marker, you could make use of regular expressions just to findout all offsets of the "\x10\x42" marker as you are doing, and them interpreting the following two bytes as the message length:
import struct, re
def get_data(data, sep=b"\x10B"):
results = []
for match in re.finditer(sep, data):
offset = match.start()
msglen = struct.unpack(">H", data[offset + 2: offset + 4])[0]
print(msglen)
results.append(data[offset + 4: offset + 4 + msglen])
return results

python proper way to decode raw udp packet

I'm making a script to get Valve's server information (players online, map, etc)
the packet I get when I request for information is this:
'\xff\xff\xff\xffI\x11Stargate Central CAP SBEP\x00sb_wuwgalaxy_fix\x00garrysmod\x00Spacebuild\x00\xa0\x0f\n\x0c\x00dw\x00\x0114.09.08\x00\xb1\x87i\x06\xb4g\x17.\x15#\x01gm:spacebuild3\x00\xa0\x0f\x00\x00\x00\x00\x00\x00'
This may help you to see what I'm trying to do https://developer.valvesoftware.com/wiki/Server_queries#A2S_INFO
The problem is, I don't know how to decode this properly, it's easy to get the string but I have no idea how to get other types like byte and short
for example '\xa0\x0f'
For now I'm doing multiple split but do you know if there is any better way of doing this?
Python has functions for encoding/decoding different data types into bytes. Take a look at the struct package, the functions struct.pack() and struct.unpack() are your friends there.
taken from https://docs.python.org/2/library/struct.html
>>> from struct import *
>>> pack('hhl', 1, 2, 3)
'\x00\x01\x00\x02\x00\x00\x00\x03'
>>> unpack('hhl', '\x00\x01\x00\x02\x00\x00\x00\x03')
(1, 2, 3)
The first argument of the unpack function defines the format of the data stored in the second argument. Now you need to translate the description given by valve to a format string. If you wanted to unpack 2 bytes and a short from a data string (that would have a length of 4 bytes, of course), you could do something like this:
(first_byte, second_byte, the_short) = unpack("cc!h", data)
You'll have to take care yourself, to get the correct part of the data string (and I don't know if those numbers are signed or not, be sure to take care of that, too).
The strings you'll have to do differently (they are null-terminated here, so start were you know a string starts and read to the first "\0" byte).
pack() work's the other way around and stores data in a byte string. Take a look at the examples on the python doc and play around with it a bit to get a feel for it (when a tuple is returned/needed, e.g.).
struct supports you in getting the right byte order, which most of the time is network byte order and different from your system. That is of course only necessary for multi byte integers (like short) - so a format string of `"!h" should unpack a short correctly.

How Does One Read Bytes from File in Python

Similar to this question, I am trying to read in an ID3v2 tag header and am having trouble figuring out how to get individual bytes in python.
I first read all ten bytes into a string. I then want to parse out the individual pieces of information.
I can grab the two version number chars in the string, but then I have no idea how to take those two chars and get an integer out of them.
The struct package seems to be what I want, but I can't get it to work.
Here is my code so-far (I am very new to python btw...so take it easy on me):
def __init__(self, ten_byte_string):
self.whole_string = ten_byte_string
self.file_identifier = self.whole_string[:3]
self.major_version = struct.pack('x', self.whole_string[3:4]) #this
self.minor_version = struct.pack('x', self.whole_string[4:5]) # and this
self.flags = self.whole_string[5:6]
self.len = self.whole_string[6:10]
Printing out any value except is obviously crap because they are not formatted correctly.
If you have a string, with 2 bytes that you wish to interpret as a 16 bit integer, you can do so by:
>>> s = '\0\x02'
>>> struct.unpack('>H', s)
(2,)
Note that the > is for big-endian (the largest part of the integer comes first). This is the format id3 tags use.
For other sizes of integer, you use different format codes. eg. "i" for a signed 32 bit integer. See help(struct) for details.
You can also unpack several elements at once. eg for 2 unsigned shorts, followed by a signed 32 bit value:
>>> a,b,c = struct.unpack('>HHi', some_string)
Going by your code, you are looking for (in order):
a 3 char string
2 single byte values (major and minor version)
a 1 byte flags variable
a 32 bit length quantity
The format string for this would be:
ident, major, minor, flags, len = struct.unpack('>3sBBBI', ten_byte_string)
Why write your own? (Assuming you haven't checked out these other options.) There's a couple options out there for reading in ID3 tag info from MP3s in Python. Check out my answer over at this question.
I am trying to read in an ID3v2 tag header
FWIW, there's already a module for this.
I was going to recommend the struct package but then you said you had tried it. Try this:
self.major_version = struct.unpack('H', self.whole_string[3:5])
The pack() function convers Python data types to bits, and the unpack() function converts bits to Python data types.

Categories