How to parse binary data from socket in python?

How to parse binary data from socket in python? - python

I'm currently working on a project where I need to communicate to a microhard cellular modem IPn3G. I have the modem set up to send messages to my computer through TCP and I can pick up the message in a socket.
The message looks like this though:
���������DKReadyCANRogersWirelessInc. Home354626030393530302720391029547
Now, I can recognize a few of these fields like the Status or the Carrierinfo as well as the imei and imsi in the end.
My problem is, how do I parse the funny looking things? I have tried struct, but it didn't seem to help me out very much.
In the documentation of the modem I only found this:
Modem_event message structure:
fixed header (fixed size 20 bytes)
Modem ID (uint64_t (8 bytes))
Message type mask (uint8_t(1 byte))
reserved
packet length (uint16_t(2 bytes))
Note: packet length = length of fixed header + length of message payload.
Carrier info:
Content length 2 BYTES (UINT16_T)
RSSI 1 BYTE (UINT8_T)
RF Band 2 BYTES (UINT16_T)
Service type STRING (1-30 Bytes)
Channel number STRING (1-30 Bytes)
SIM card number STRING (1-30 Bytes)
Phone number STRING (1-30 Bytes)
To me it seems like the message doesn't even line up with what it's supposed to be. I would be very glad if anyone had advice on how to tackle this problem.
Thank you

Python has great struct module, which allows you to pack and unpack binary data.
I see, that you've already tried to use it. I don't have a documentation, for your device (go and check it!), but I can suppose, that strings are null-terminated (they don't have their size given anywhere before).
Get the size of the string part of the message (from size of the other fields and packet length) and read all strings into one Python string using <number>s and split them finding the null characters.

Related

Python Modbus String decoding issue

I have an issue with the pymodbus decoder with strings. For example, when I try to read 'abcdefg' pymodbus gives me 'badcfehg'. The byteorder and the wordorder don't change the result.
Here is my code:
result=client.read_holding_registers(25000,4)
decoder = BinaryPayloadDecoder.fromRegisters(result.registers,byteorder=Endian.Little,wordorder=Endian.Big)
decoder.decode_string(8)
Can someone explain why the order does not change the result? I try with the builder and it's the same problem. However, I don't have this problem with 32 bits floats for example.
I also tried with an older version of pymodbus and it works:
decoder = BinaryPayloadDecoder.fromRegisters(registers,endian=Endian.Little)
Note: I already read the following topic: pymodbus: Issue reading String & multiple type of data from Modbus device but I don't have any access to the modbus server.

The problem is that Modbus specs does not define in what order the two bytes for char strings are sent or even in what order 16-bit words are sent for 32-bit types.
Then some Modbus devices send bytes or words in an order and others do the opposite.
If you are writing a Modbus client then you should add the option in the configuration to be able to invert the order of both bytes and 16-bit words in 32-bit data types.
https://en.wikipedia.org/wiki/Endianness

I've encountered this same issue with byteorder not working (wordorder seems to be fine). The solution I came up with is to use Struct:
import struct
Then:
count = 4 #Read 4 16bit registers
result = client.read_holding_registers(25000,count)
for i in range(count):
result.registers[i] = struct.unpack("<H", struct.pack(">H", result.registers[i]))[0]
decoder = BinaryPayloadDecoder.fromRegisters(result.registers)
print(decoder.decode_string(7)) #Since string is 7 characters long
This uses Struct to unpack and pack as an unsigned short integer. The endianness does not matter since all you're doing is swapping bytes. The result overwrites the registers so you can then use the BinaryPayloadDecoder as you normally would.
I would have preferred to iterate through the responses instead of using range(count), but couldn't find a way to do it and wanted to post this workaround. If I figure it out, I will let you know.

Protobuf-net unrecognized stream prefix

I'm trying to reverse engineer the Quasar RAT protobuf protocol structure.
Quasar is a Remote Administration Tool written in C# which is open source and can be found online here.
https://github.com/quasar/QuasarRAT
I've managed to reverse most of it and I can now connect to the Quasar server client from a python script. How ever one question remains open, it appears that every byte stream that is being sent from the client to the server begins with a 3 byte field which is not registered within the protobuf class within Quasar. This field seems to provide the length of the message not including the prefixed bytes. As can be seen within this block for an example a prefixed byte stream generated for an array of size 0x2d2, these are the prefixed bytes being appended to the message.
0x0A, 0xCF, 0x05
If how ever I decide to change the message fields before serializing the message, this byte stream would change except from the first 0x0A byte. It seems that if I keep appending bytes to the message fields the second byte grows and if I overflow the second byte(make it reach above 0xff) - it would increment the third byte and reset the second byte to 0x80. But the math wont make sense to me at all as this field should return the size of the array but doesn't under any sensible formula that I could compute. I know that protobuf-net can generate PreLengthPrefix bytes to prefix the message with the length of it but this is not the case here.
Any help would be appreciated.

The encoding rules are here: https://developers.google.com/protocol-buffers/docs/encoding
Basically, each field is a encoded as a field-header (aka "tag"), followed by a payload. The field-header is a "varint" (see the encoding guide), the value of which is an integer composed of a field number and the wire-type. The wire-type is the 3 least significant bits, and the field number is the rest (shifted by 3 bits). In the case of 0x0A (binary 1010), the wire type is 2 (binary 010), and the field number is 1.
How you treat the payload depends on the wire type. For wire type 2 (length prefixed), you should expect next:
a varint that is the length of the payload in bytes, then
that many bytes of the actual payload
Unfortunately protobuf is ambiguous without a schema, so knowing that you have length prefixed data doesn't tell you what the data is; a length prefixed payload could be:
a UTF-8 string
a raw BLOB (bytes)
a sub-message
a "packed" array of some primitive type (integers/floating point numbers/etc) - remembering that the length prefix is the number of bytes, not the number of elements; the elements are not even necessarily fixed size (they could themselves be varints
In many ways, the purpose of the wire type isn't to tell you how to interpret the data; it is to tell you how to skip (or just store verbatim) the field if it isn't one you know about. For example, somebody else is using V3 of the API and you have only updated your schema to V2; they send a V3 message to your V2 API; V3 has extra fields you don't care about - the deserializer needs to not break when it hits them, so the wire type tells it how to ignore the field (i.e. what the rules are for finding the next field). Otherwise, we could just use the schema information and not store the wire type in the payload at all (although it is also used as an optimization on repeated primitive data, via "packed" arrays - it is up to the serializer whether it encodes such as length-prefixed vs lots of field header/value pairs).

Reading mixed ASCII/binary serial data stream -- how to find delimiter?

I am trying to read serial data from a device that outputs in a mix of ASCII and binary, using Python 3.
The message format is: "$PASHR,msg type,binary payload,checksum,\r\n" (minus the quotes)
To make it more interesting, there are several different message types, and they have different payload lengths, so I can't just read X bytes (I can infer the payload length based on the message type). The sequence of a variable number of messages of each type (around 15 in total) is sent every 20 seconds, at 115200 baud.
I haven't been able to read this with serial.readline(), probably because of newlines embedded in the payload.
I think that if I could set the line-end character to the sequence "$PASHR" that would give me a way to frame the messages -- ie, everything between one $PASHR and the next is one message, and the likelihood of seeing the sequence in the binary payload is nil. But I have not been able to make it work either using serial.newline = b'$PASHR' or readuntil(b'$PASHR') -- I still get variable length reads. I suspect that the serial.timeout setting enters into the solution, but I am not sure how.
Here's the last version I ended up with last night:
delimiter = b'$PASHR'
while True:
if self.serial.in_waiting:
message = self.serial.read_until(delimiter)
Every method I've tried gives a variable response length, when each record of a give should be the same length. Is there a way to set the newline to that multi-character string, or is there a different/better approach I should use?
Thanks!

I think I figured it out. This seems to work with a timeout of 1 second:
self.serial.timeout = 1
delimiter = b'$PASHR'
if self.serial.in_waiting:
message = self.serial.read_until(delimiter)
if len(message) > 6: # ignore carryover PASHR'
***process message***
When I looked carefully at the variable length returns, I discovered that the message sequence after the first incomplete cycle was:
b'$PASHR' (6 bytes)
b',MPC,*data**checksum*\r\n$PASHR' (108 bytes)
...
b',MPC,*data**checksum*\r\n' (102 bytes)
(delay)
b'$PASHR' (6 bytes)
The delimiter from the last message in the sequence is chopped off and output at the beginning of the next sequence. I can deal with that.
I will try Deepstop's idea of the inter_byte timeout as it is probably more elegant than using the raw timeout, and might get around the partial message problem.

Python Parsing MBR

I am just starting out with Python scripting and I am trying to write a program that will parse through a provided MBR but I'm not sure how to start.
I want to write a program that will parse a portion of the MBR's partition table. The first partition entry is located at the address 1BE. Print out the status byte (1 byte located at the starting address), the partition type (1 byte located at the address 1BE + 4) and the address of the first sector in the partition (1BE + 8).
Any help would be greatly appreciated!

Batteries included. Use the array or struct module.
Or else one of these (but here they're likely overkill):
https://github.com/digidotcom/python-suitcase
https://github.com/vstinner/hachoir3
https://github.com/Muterra/py_smartyparse

I know this is a very old question, but I came here looking for an answer and the only one here did not answer the question itself very well. I believe I have a proper understanding of the question and answer at this point; So, starting with the first part which is the status address. A status address is typically something like: 0x80, which is an active status flag, and is only a byte long. This can be found with the following lines:
import struct # This is where we get our bytearray() structure
mbr = bytearray() # We want each index of our array to be a byte
binary_file = open(file, 'rb')
mbr = binary_file.read(512) # The first 512 bytes are the first sector, which is the MBR
status_flag = mbr[0x1BE]
The status flag is only a single byte, and because we know it is located at the address 0x1BE we are able to simply pull that index from the MBR array (what we gathered when we read the file but broken into 1 byte chunks). Another way to read 0x1BE could be as the integer 446; so we are really looking at the byte stored in the index mbr[446] in the example above (Because we start with 0x Python knows to interpret it as a hex value, so 446 is 0x1BE).
Moving onto the second part, similarly to the first part, the partition type is a single byte stored at the address 0x1BE+4 or 0x1C2. So, to find this, much like with the status byte, we are able to simply do:
partition_type = mbr[0x1C2]
Because the partition type is also just a byte, and each index of our mbr array is a byte, we can simply pull the value at the address 0x1C2.
As for the last part, the address of the first sector is a 4-byte value that starts at the address: 0x1BE+8 or 0x1C6. Because it is bytes, we know that it ends at the address 0x1BE+12 or 0x1CA. So, to find this, we can do the following:
first_sector_addr = struct.unpack('<I', mbr[0x1C6:0x1CA])
'''
For the line above, we are using the unpack function also
included with the struct import. This function takes two
primary arguments: the byte order/size/alignment, and the
data to read (https://docs.python.org/3/library/struct.html).
We must read the data as little-endian and as an unsigned int
(https://thestarman.pcministry.com/asm/mbr/PartTables.htm).
'''
Once we have all of the variables collected (status_flag, partition_type, first_sector_addr) we can print each of them to the screen. I recommend printing the first two as hex values as these are what are used for identification. For example, if the partition type has the hex value 0x83 it is a Linux Native file system (https://thestarman.pcministry.com/asm/mbr/PartTypes.htm)
https://thestarman.pcministry.com/asm/mbr/PartTables.htm
https://en.wikipedia.org/wiki/Master_boot_record#Sector_layout
https://www.ijais.org/research/volume10/number8/sadi-2016-ijais-451541.pdf
(Last link will prompt for pdf download, but is a useful resource on MBR. I think that is why I had to post it as code rather than text)

How to parse encrypted data in function dataRecieved in twisted?

In my project I inherited twisted protocol from twisted.internet.protocol import Protocol, As we know, I should do something packet manipulation in the function dataRecieved, However, the doc said:
data: a string of indeterminate length. Please keep in mind that you will probably need to buffer some data, as partial (or multiple) protocol messages may be received! I recommend that unit tests for protocols call through to this method with differing chunk sizes, down to one byte at a time.
So my data format is that:
packet len | payload
2 bytes | variable bytes
But things become complected if I want to encrypt my data, How should I do then?
Should I encrypt both packet length and payload? then how to judge if the packet is end?
Or should I encrypt only the payload, then alter the packet length ? what if the encrypted payload length is larger than the max value of 2bytes?
Plus: If I use raw socket instead of twisted, can I omit the 2 bytes packet len prefix?
Thanks!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.