Protobuf-net unrecognized stream prefix

Protobuf-net unrecognized stream prefix - python

I'm trying to reverse engineer the Quasar RAT protobuf protocol structure.
Quasar is a Remote Administration Tool written in C# which is open source and can be found online here.
https://github.com/quasar/QuasarRAT
I've managed to reverse most of it and I can now connect to the Quasar server client from a python script. How ever one question remains open, it appears that every byte stream that is being sent from the client to the server begins with a 3 byte field which is not registered within the protobuf class within Quasar. This field seems to provide the length of the message not including the prefixed bytes. As can be seen within this block for an example a prefixed byte stream generated for an array of size 0x2d2, these are the prefixed bytes being appended to the message.
0x0A, 0xCF, 0x05
If how ever I decide to change the message fields before serializing the message, this byte stream would change except from the first 0x0A byte. It seems that if I keep appending bytes to the message fields the second byte grows and if I overflow the second byte(make it reach above 0xff) - it would increment the third byte and reset the second byte to 0x80. But the math wont make sense to me at all as this field should return the size of the array but doesn't under any sensible formula that I could compute. I know that protobuf-net can generate PreLengthPrefix bytes to prefix the message with the length of it but this is not the case here.
Any help would be appreciated.

The encoding rules are here: https://developers.google.com/protocol-buffers/docs/encoding
Basically, each field is a encoded as a field-header (aka "tag"), followed by a payload. The field-header is a "varint" (see the encoding guide), the value of which is an integer composed of a field number and the wire-type. The wire-type is the 3 least significant bits, and the field number is the rest (shifted by 3 bits). In the case of 0x0A (binary 1010), the wire type is 2 (binary 010), and the field number is 1.
How you treat the payload depends on the wire type. For wire type 2 (length prefixed), you should expect next:
a varint that is the length of the payload in bytes, then
that many bytes of the actual payload
Unfortunately protobuf is ambiguous without a schema, so knowing that you have length prefixed data doesn't tell you what the data is; a length prefixed payload could be:
a UTF-8 string
a raw BLOB (bytes)
a sub-message
a "packed" array of some primitive type (integers/floating point numbers/etc) - remembering that the length prefix is the number of bytes, not the number of elements; the elements are not even necessarily fixed size (they could themselves be varints
In many ways, the purpose of the wire type isn't to tell you how to interpret the data; it is to tell you how to skip (or just store verbatim) the field if it isn't one you know about. For example, somebody else is using V3 of the API and you have only updated your schema to V2; they send a V3 message to your V2 API; V3 has extra fields you don't care about - the deserializer needs to not break when it hits them, so the wire type tells it how to ignore the field (i.e. what the rules are for finding the next field). Otherwise, we could just use the schema information and not store the wire type in the payload at all (although it is also used as an optimization on repeated primitive data, via "packed" arrays - it is up to the serializer whether it encodes such as length-prefixed vs lots of field header/value pairs).

Related

Reading mixed ASCII/binary serial data stream -- how to find delimiter?

I am trying to read serial data from a device that outputs in a mix of ASCII and binary, using Python 3.
The message format is: "$PASHR,msg type,binary payload,checksum,\r\n" (minus the quotes)
To make it more interesting, there are several different message types, and they have different payload lengths, so I can't just read X bytes (I can infer the payload length based on the message type). The sequence of a variable number of messages of each type (around 15 in total) is sent every 20 seconds, at 115200 baud.
I haven't been able to read this with serial.readline(), probably because of newlines embedded in the payload.
I think that if I could set the line-end character to the sequence "$PASHR" that would give me a way to frame the messages -- ie, everything between one $PASHR and the next is one message, and the likelihood of seeing the sequence in the binary payload is nil. But I have not been able to make it work either using serial.newline = b'$PASHR' or readuntil(b'$PASHR') -- I still get variable length reads. I suspect that the serial.timeout setting enters into the solution, but I am not sure how.
Here's the last version I ended up with last night:
delimiter = b'$PASHR'
while True:
if self.serial.in_waiting:
message = self.serial.read_until(delimiter)
Every method I've tried gives a variable response length, when each record of a give should be the same length. Is there a way to set the newline to that multi-character string, or is there a different/better approach I should use?
Thanks!

I think I figured it out. This seems to work with a timeout of 1 second:
self.serial.timeout = 1
delimiter = b'$PASHR'
if self.serial.in_waiting:
message = self.serial.read_until(delimiter)
if len(message) > 6: # ignore carryover PASHR'
***process message***
When I looked carefully at the variable length returns, I discovered that the message sequence after the first incomplete cycle was:
b'$PASHR' (6 bytes)
b',MPC,*data**checksum*\r\n$PASHR' (108 bytes)
...
b',MPC,*data**checksum*\r\n' (102 bytes)
(delay)
b'$PASHR' (6 bytes)
The delimiter from the last message in the sequence is chopped off and output at the beginning of the next sequence. I can deal with that.
I will try Deepstop's idea of the inter_byte timeout as it is probably more elegant than using the raw timeout, and might get around the partial message problem.

Python Parsing MBR

I am just starting out with Python scripting and I am trying to write a program that will parse through a provided MBR but I'm not sure how to start.
I want to write a program that will parse a portion of the MBR's partition table. The first partition entry is located at the address 1BE. Print out the status byte (1 byte located at the starting address), the partition type (1 byte located at the address 1BE + 4) and the address of the first sector in the partition (1BE + 8).
Any help would be greatly appreciated!

Batteries included. Use the array or struct module.
Or else one of these (but here they're likely overkill):
https://github.com/digidotcom/python-suitcase
https://github.com/vstinner/hachoir3
https://github.com/Muterra/py_smartyparse

I know this is a very old question, but I came here looking for an answer and the only one here did not answer the question itself very well. I believe I have a proper understanding of the question and answer at this point; So, starting with the first part which is the status address. A status address is typically something like: 0x80, which is an active status flag, and is only a byte long. This can be found with the following lines:
import struct # This is where we get our bytearray() structure
mbr = bytearray() # We want each index of our array to be a byte
binary_file = open(file, 'rb')
mbr = binary_file.read(512) # The first 512 bytes are the first sector, which is the MBR
status_flag = mbr[0x1BE]
The status flag is only a single byte, and because we know it is located at the address 0x1BE we are able to simply pull that index from the MBR array (what we gathered when we read the file but broken into 1 byte chunks). Another way to read 0x1BE could be as the integer 446; so we are really looking at the byte stored in the index mbr[446] in the example above (Because we start with 0x Python knows to interpret it as a hex value, so 446 is 0x1BE).
Moving onto the second part, similarly to the first part, the partition type is a single byte stored at the address 0x1BE+4 or 0x1C2. So, to find this, much like with the status byte, we are able to simply do:
partition_type = mbr[0x1C2]
Because the partition type is also just a byte, and each index of our mbr array is a byte, we can simply pull the value at the address 0x1C2.
As for the last part, the address of the first sector is a 4-byte value that starts at the address: 0x1BE+8 or 0x1C6. Because it is bytes, we know that it ends at the address 0x1BE+12 or 0x1CA. So, to find this, we can do the following:
first_sector_addr = struct.unpack('<I', mbr[0x1C6:0x1CA])
'''
For the line above, we are using the unpack function also
included with the struct import. This function takes two
primary arguments: the byte order/size/alignment, and the
data to read (https://docs.python.org/3/library/struct.html).
We must read the data as little-endian and as an unsigned int
(https://thestarman.pcministry.com/asm/mbr/PartTables.htm).
'''
Once we have all of the variables collected (status_flag, partition_type, first_sector_addr) we can print each of them to the screen. I recommend printing the first two as hex values as these are what are used for identification. For example, if the partition type has the hex value 0x83 it is a Linux Native file system (https://thestarman.pcministry.com/asm/mbr/PartTypes.htm)
https://thestarman.pcministry.com/asm/mbr/PartTables.htm
https://en.wikipedia.org/wiki/Master_boot_record#Sector_layout
https://www.ijais.org/research/volume10/number8/sadi-2016-ijais-451541.pdf
(Last link will prompt for pdf download, but is a useful resource on MBR. I think that is why I had to post it as code rather than text)

Python struct.unpack binary file

I'm using struct.unpack to read the 11th byte of a file to the 21st byte which represents a field that is supposed to read 'SNA'. The field is 'populated as BCS-A where it is left justified and padded to the right boundary with BCS spaces'. Since the field is 10 bytes long, my format string is '10s'. However, per the output mentioned, the remaining 7 bytes are spaces. To eliminate those spaces I use strip. Unfortunately, this still yields 'SNA\x00'. What am I doing wrong?
field = struct.unpack('10s',data[start:stop])
field[0].strip() (since the output of a strut.unpack is a tuple)

Your data doesn't conform to the standard you've specified. Either contact your data supplier and have them fix their bug, or be more generous about your definition of "space". If you want to accept that data, you could, for example, do this:
field[0].strip(' \t\n\x00')
or, with more limited acceptance:
field[0].strip().rstrip('\x00')

How to parse binary data from socket in python?

I'm currently working on a project where I need to communicate to a microhard cellular modem IPn3G. I have the modem set up to send messages to my computer through TCP and I can pick up the message in a socket.
The message looks like this though:
���������DKReadyCANRogersWirelessInc. Home354626030393530302720391029547
Now, I can recognize a few of these fields like the Status or the Carrierinfo as well as the imei and imsi in the end.
My problem is, how do I parse the funny looking things? I have tried struct, but it didn't seem to help me out very much.
In the documentation of the modem I only found this:
Modem_event message structure:
fixed header (fixed size 20 bytes)
Modem ID (uint64_t (8 bytes))
Message type mask (uint8_t(1 byte))
reserved
packet length (uint16_t(2 bytes))
Note: packet length = length of fixed header + length of message payload.
Carrier info:
Content length 2 BYTES (UINT16_T)
RSSI 1 BYTE (UINT8_T)
RF Band 2 BYTES (UINT16_T)
Service type STRING (1-30 Bytes)
Channel number STRING (1-30 Bytes)
SIM card number STRING (1-30 Bytes)
Phone number STRING (1-30 Bytes)
To me it seems like the message doesn't even line up with what it's supposed to be. I would be very glad if anyone had advice on how to tackle this problem.
Thank you

Python has great struct module, which allows you to pack and unpack binary data.
I see, that you've already tried to use it. I don't have a documentation, for your device (go and check it!), but I can suppose, that strings are null-terminated (they don't have their size given anywhere before).
Get the size of the string part of the message (from size of the other fields and packet length) and read all strings into one Python string using <number>s and split them finding the null characters.

Uncompress Zlib string in using ByteArrays

I have a web application developed in Adobe Flex 3 and Python 2.5 (deployed on Google App Engine). A RESTful web service has been created in Python and its results are currently in an XML format which is being read by Flex using the HttpService object.
Now the main objective is to compress the XML so that there is as less a time between the HttpService send() method and result events. I looked up Python docs and managed to use zlib.compress() to compress the XML result.
Then I set the HttpService result type from "xml" to "text" and tried using ByteArrays to uncompress the string back to XML. Here's where I failed. I am doing something like this:
var byteArray:ByteArray = new ByteArray();
byteArray.writeUTF( event.result.toString() );
byteArray.uncompress();
var xmlResult:XML = byteArray.readUTF();
Its throwing an exception at byteArray.uncompress() and says unable to uncompress the byteArray. Also when I trace the length of the byteArray it gets 0.
Unable to figure out what I'm doing wrong. All help is appreciated.
-- Edit --
The code:
# compressing the xml result in Python
print zlib.compress(xmlResult)
# decompresisng it in AS3
var byteArray:ByteArray = new ByteArray();
byteArray.writeUTF( event.result.toString() );
byteArray.uncompress()
Event is of type ResultEvent.
The error:
Error: Error #2058: There was an error decompressing the data.
The error could be because the value of byteArray.bytesAvailable = 0 which means the raw bytes python generated hasn't been written into byteArray properly..
-- Sri

What is byteArray.writeUTF( event.result.toString() ); supposed to do? The result of zlib.compress() is neither unicode nor "UTF" (meaningless without a number after it!?); it is binary aka raw bytes; you should neither decode it nor encode it nor apply any other transformation to it. The receiver should decompress immediately the raw bytes that it receives, in order to recover the data that was passed to zlib.compress().
Update What documentation do you have to support the notion that byteArray.uncompress() is expecting a true zlib stream and not a deflate stream (i.e. a zlib stream after you've snipped the first 2 bytes and the last 4)?
The Flex 3 documentation of ByteArray gives this example:
bytes.uncompress(CompressionAlgorithm.DEFLATE);
but unhelpfully doesn't say what the default (if any) is. If there is a default, it's not documented anywhere obvious, so it would be a very good idea for you to use
bytes.uncompress(CompressionAlgorithm.ZLIB);
to make it obvious what you intend.
AND the docs talk about a writeUTFBytes method, not a writeUTF method. Are you sure that you copy/pasted the exact receiver code in your question?
Update 2
Thanks for the URL. Looks like I got hold of the "help", not the real docs :=(. A couple of points:
(1) Yes, there is an explicit inflate() method. However uncompress DOES have an algorithm arg; it can be either CompressionAlgorithm.ZLIB (the default) or CompressionAlgorithm.DEFLATE ... interestingly the latter is however only available in Adobe Air, not in Flash Player. At least we know the uncompress() call appears OK, and we can get back to the problem of getting the raw bytes onto the wire and off again into a ByteArray instance.
(2) More importantly, there are both writeUTF (Writes a UTF-8 string to the byte stream. The length of the UTF-8 string in bytes is written first, as a 16-bit integer, followed by the bytes representing the characters of the string) and writeUTFBytes (Writes a UTF-8 string to the byte stream. Similar to the writeUTF() method, but writeUTFBytes() does not prefix the string with a 16-bit length word).
Whatever the merits of supplying UTF8-encoded bytes (nil, IMHO), you don't want a 2-byte length prefix there; using writeUTF() is guaranteed to cause uncompress() to bork.
Getting it on to the wire: using Python print on binary data doesn't seem like a good idea (unless sys.stdout has been nobbled to run in raw mode, which you didn't show in your code).
Likewise doing event.result.toString() getting a string (similar to a Python unicode object, yes/no?) -- with what and then encoding it in UTF-8 seem rather unlikely to work.
Given I didn't know that flex existed until today, I really can't help you effectively. Here are some further suggestions towards self-sufficiency in case nobody who knows more flex comes along soon:
(1) Do some debugging. Start off with a minimal XML document. Show repr(xml_doc). Show repr(zlib_compress_output). In (a cut-down version of) your flex script, use the closest function/method to repr() that you can find to show: event.result, event.result.toString() and the result of writeUTF*(). Make sure you understand the effects of everything that can happen after zlib.compress(). Reading the docs carefully may help.
(2) Look at how you can get raw bytes out of event.result.
HTH,
John

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.