Converting string to bytes prints weird-looking hex-code in console

Converting string to bytes prints weird-looking hex-code in console - python

Introduction my problem:
I am trying to write a C++ program that receives data over UDP using WinSock2.
For that, I have a prewritten Python script that sends 10-byte packets to a specified port, where the C++ program then receives them. I have gotten the data transfer to work, however I am confused about the data that is being sent.
My problem:
I am running the Python script from cmd, which prints the sent text on console. I have also added a line into the script, which converts the string to bytes to verify, what exactly is being sent. The first line is the one I added, fairly simple:
logger.debug("Sending packet len %s, data %s", sizeof(packet), bytes(packet))
logger.debug("Sending packet len %s, data %s", sizeof(packet), packet)
And this is the output in my terminal (from 3 different packets sent):
What confuses me, is I would expect the hex code to contain only hexadecimal symbols, but instead there are some quite seemingly random symbols/letters there as well, even though the actual printed text looks just fine. Can someone explain to me, where do these symbols come from, as I am not sure how am I supposed to interpret this information on the receiving end in my C++ code.

Your script does not print plain hex, but the interpreted hex string. Some of your hex chars are interpretable and displayable in ASCII, others are not. I tested the string replacement in my interpreter and you see the same results:
>>> "%s".encode() % b"\xfa\x3f\x00"
b'\xfa?\x00'
With \x3f being the ASCII hex code for ?.

Python tries to be helpful (whether it actually helps is debatable) by printing bytes which correspond to an ASCII symbol as the ASCII symbol.
>>> t1 = b'test'
>>> t2 = b'\x74\x65\x73\x74'
>>> t1 == t2
True
>>> print(t1, t2)
b'test' b'test'

Related

Converting shellcode hex bytes to text based inputs in Python for an unknown byte value '\x87'? Not a UTF-8 string?

So I am currently doing a beginner CTF challengeon pwnable.tw, the "start" challenge specifically. After reversing the challenge binary I found out there was a buffer overflow exploit, and one thing I would have to do to get an ideal starting point would be to leak the stack address by pointing it back to a specific address (0x08048087), so i crafted a payload, that would then overwrite the return address with the address I was aiming for. However, I'm having trouble converting the byte data into a string format to be fed to the vulnerable program.
Below is my python code:
from pwn import *
shellcode = b'A' * 20
shellcode += pack(0x08048087, 32)
print(shellcode)
I use the pwn library to simplify packing the address, and then I print it and then pipe it into the vulnerable binary as stdin. However, what will happen when I print this, is that rather than printing the string equivalent of the associated hex values of that address, it will instead print this:
b'AAAAAAAAAAAAAAAAAAAA\x87\x80\x04\x08'
Just a string literal version of the hex values themselves. However, this will of course not be interpreted by the program in the way i intend it to be. So I try to decode it into utf-8 or an ASCII string, or even use str to convert it no matter which way I choose I get the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x87 in position 20: invalid start byte
It would seem it can't decode the 0x87, which makes sense, in this case there does not seem to be an equivalent for it to decode to. But then my question becomes how can I deliver my shell code, specifically the hexadecimal address part, to the program in a way that the program will interpret that portion of the overflowed buffer as the address that i intend it to, rather than it being incorrectly mapped since my script gave me a stringified version of the hex values themselves?

So I ended up finding the answer, it was to use sys.stdout.buffer.write(), rather than print or sys.stdout.write() since sys.stdout.buffer.write() uses a BufferedWriter which simply operates on raw bytes rather than the other two which operate on text/strings. Thank you to everyone in the comments who helped me!

Attempting to debug hex instruction, but python clears my console?

I'm writing a driver and am concatenating some hex instructions based on a few conditionals. Up until this point, all instructions have worked as intended.
A new instruction I was working on isn't working as intended, so I attempted to print out the instruction after concatenation and before execution to see what was wrong.
msg = '\xc2%s%s' % ('\x1b\x63', '07')
assert self.dev.ctrl_transfer(0x21, 9, 0x0300, 0, msg) == len(msg)
print(msg)
When I print it after concatenation it clears the console and prints '07' and then continues the rest of the driver execution. I'm able to print and execute every other instruction I've concatenated, such as the following, without issue.
msg = '\xc2%s%s' % ('\x1b\x72, '07')
Does anyone have an idea why this is happening? Does the '\x63' byte tell python to do something I'm unaware of? It should just be concatenated to the rest of the instruction, followed by the '\x07' byte. Note, that if I include the '\x' before the '07' (unlike my code above) it still does the same thing, it just doesn't print '07', it leaves a blank line.
Thanks!

The character '\x63' is the same character as 'c' (and a half-dozen other ways to spell it). The letter c doesn't mean anything special to Python.
The character '\x1b' right before the c is Escape. That doesn't mean anything special to Python either—but it probably does to your terminal. Most terminals use "escape sequences" that start with Escape and end with a letter to do things like scroll up, changing the main text color, or clear the screen.
If this is getting in the way of an interactive debugging session, you may want to consider printing the repr of the string rather than the string itself. The easiest way to do that is to not even use print:
>>> msg = b'\x1b\x63'
>>> msg
b'\x1bc'
>>> print(repr(msg))
b'\x1bc'
Notice that either way, it includes the b and the quotes—and that it hex-escapes all non-printable bytes. And it works basically the same with Unicode strings instead of byte string:
>>> msg = '\x1b\x63'
>>> msg
'\x1bc'
>>> print(repr(msg))
'\x1bc'
If you're using Python 2.x, you'll have u prefixes instead of none on the Unicode ones, and no prefixes instead of b on the bytes, but basically the same.

Issues with Bytes from a Microcontroller in Python

I am using Python to read micro controller values in a windows based program. The encodings / byte decodings and values have begun to confuse me. Here is my situation:
In the software, I am allowed to call a receive function once per byte received by the Python interpreter, once per line (not quite sure what that is) or once per message which I assume is the entire transmission from the micro controller.
I am struggling with the best way to decode these values. The microcontroller is putting out specific values that correlate to a protocol. For example, calling a function that is supposed to return the hex values:
F0, 79, (the phrase standard_firmata.pde) [then] F7
returns:
b'\xf0y\x02\x03S\x00t\x00a\x00n\x00d\x00a\x00r\x00d\x00F\x00i\x00r\x00m\x00a\x00t\x00a\x00.\x00i\x00n\x00o\x00\xf7'
When set to "once per message" . This is what I want, I can see that the correct values are being sent, but there are too man \x00 values included (they are after every byte it seems). Additionally, the second byte is 0ywhen it is supposed to be 79. It seems like it printed its value in ASCII when all the others were in hex.
How can I ignore all these null characters and make everything in the right format (I am fine with normal hex values)

When Python represents a bytes value, it'll use the ASCII representation for anything that has a printable character. Thus the hex 0x79 byte is indeed represented by a y:
>>> b'\x79'
b'y'
Using ASCII characters makes the representation more readable, but doesn't affect the contents. You can use \x.. hex and ASCII notations interchangeably when creating bytes values.
The data appears to encode a UTF-16 message, little endian:
>>> data = b'\xf0y\x02\x03S\x00t\x00a\x00n\x00d\x00a\x00r\x00d\x00F\x00i\x00r\x00m\x00a\x00t\x00a\x00.\x00i\x00n\x00o\x00\xf7'
>>> data[4:-1].decode('utf-16-le')
'̂StandardFirmata.ino'
UTF 16 uses 2 bytes per character, and for ASCII (and Latin 1) codepoints that means that each 2nd byte is a null.
You can use simple comparisons to test for message types:
if data[:2] == b'\xf0\x79':
assert data[-1] == 0xf7, "Message did not end with F7 closing byte"
version = tuple(data[2:4])
message = data[4:-1].decode('utf-16-le')

Python Character Encoding

I have a python script that retrieves information from a web service and then looks up data in a MySQL db. The data is unicode when I receive it, however I want the SQL statement to use the actual character (Băcioi in the example below). As you can see, when I try and encode it to utf-8 the result is still not what I'm looking for.
>>> x = u'B\u0103cioi'
>>> x
u'B\u0103cioi'
>>> x.encode('utf-8')
'B\xc4\x83cioi'
>>> print x
Băcioi ## << What I want!

Your encoding is working fine. Python is simply showing you the repr()'d version of it on the command line, which uses \x escapes. You can tell because of the fact that it's also displaying the quotes around the string.
print does not do any mutation of the string - if it prints out the character you want, that's what is actually in the contents of the string.

How to capture all characters in binary string without python interpreting it

Here is how I reproduce the problem:
Create a log file called 'temp.log' and paste this line into it
DEBUG: packetReceived '\x61\x62\x63'
I want to have a script which will read the line from the log file and decode the binary string part ('\x61\x62\x63'). For the decoding, I am using struct, so:
struct.unpack('BBB', '\x61\x62\x63')
Should give me
(97, 98, 99)
Here is the script which I am using
import re
import struct
import sys
f = open(sys.argv[1], 'r')
for line in f:
print line
packet = re.compile(r"packetReceived \'(.*)\'").search(line).group(1)
# packet is the string r'\x61\x62\x63'
assert(len(packet), 12)
# this works ok (returns (97, 98, 99))
struct.unpack('BBB', '\x61\x62\x63')
# this fails because packet is interpreted as r'\\x61\\x62\x63'
struct.unpack('BBB', packet)
I run the script using temp.log as the argument to the script.
Hopefully the comments highlight my problem. How can I get the variable packet to be interpreted as '\x61\x62\x63' ??
ASIDE: On the first edit of this question, I assumed that reading the line from the file was the same as this:
line = "DEBUG: packetReceived '\x61\x62\x63'"
which made packet == 'abc'
however it is actually the same as this (using rawstring)
line = r"DEBUG: packetReceived '\x61\x62\x63'"

Python doesn't interpret strings that you pass to regular expressions. The escape sequences were most likely interpreted earlier, when you defined variable line. This works correctly for example:
line = r"DEBUG: packetReceived '\x61\x62\x63'"
print re.compile(r"packetReceived '(.*)'").search(line).group(1)
It prints \x61\x62\x63.

>>> re.compile(r"packetReceived '(.*)'").search(r"DEBUG: packetReceived '\x61\x62\x63'").group(1)
'\\x61\\x62\\x63'
Nope, that line is not where your problem lies.

As described in your question, packet is equal to '\x61\x62\x63'. Its len is 12 bytes, neither 15 nor 3 bytes.
What confuses you, is that ipython (which I understand you are using) and the python interpreter display values using the repr() call, which tries to format values as they would be in your code. Since backslashes are special in Python string constants, repr() displays them duplicated, as they would be in Python code.
This might be of help:
for char in packet:
print("%5d %2s %2r" % (ord(char), char, char))
Count your characters and see how they are printed. First column displays the ordinal value of the character, second column has the character itself, third column has the repr of the character.
EDIT
Change the last line:
struct.unpack('BBB', packet)
to:
struct.unpack('BBB', packet.decode('string_escape'))

If you're sure you are receiving twelve characters and not just three represented as twelve, it may be just the printing of the string that is causing you grief.
Compare:
>> print '\x61\x62\x63'
abc
>>> print r'\x61\x62\x63'
\x61\x62\x63
My 50c is on you actually receiving three characters and them being printed like this:
>>> print ''.join('\\x%02x' % ord(c) for c in 'abc')
\x61\x62\x63

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Converting string to bytes prints weird-looking hex-code in console - python

Python tries to be helpful (whether it actually helps is debatable) by printing bytes which correspond to an ASCII symbol as the ASCII symbol. >>> t1 = b'test' >>> t2 = b'\x74\x65\x73\x74' >>> t1 == t2 True >>> print(t1, t2) b'test' b'test'

Related

Converting shellcode hex bytes to text based inputs in Python for an unknown byte value '\x87'? Not a UTF-8 string?

Attempting to debug hex instruction, but python clears my console?

Issues with Bytes from a Microcontroller in Python

Python Character Encoding

How to capture all characters in binary string without python interpreting it

Categories

Resources