why is my hex string not behaving like a string? - python

my purpose here is to turn a small file into a qrcode.
So I used binascii.hexlify() to get the hexadecimal of the file.
With pillow I will then build the qr code, this qr code will be read by an other script that will turn it back into a file.
import binascii
with open(r"D:\test.png", 'rb') as f:
content = f.read()
hexstr = str(binascii.hexlify(content))
#print(hexstr)
print(hexstr[:5])
the weird thing here is that the hexstr is 64eded8d8d8d6ad3bcefd7a616864b4aea169786434393975eecb73a1b896cae4e80da592d7dcf2... but the hexstr[:5] is b'895 (i was expecting 64ede)
why is that?
Thanks.
ps : I'm using python 3.6x64 on a windows 10 machine

I'm not sure why you're getting b'895, but when you run hexstr = str(binascii.hexlify(content) it gives hexstr the value "b'64ede...'". The string representation of the bytes sequence includes the b' prefix. I think what you want is hexstr = binascii.hexlify(content).decode(). This will decode the binary string into the corresponding ascii.
import binascii
with open(r"D:\test.png", 'rb') as f:
content = f.read()
hexstr = binascii.hexlify(content).decode()
#print(hexstr)
print(hexstr[:5])

Related

Save bytes in a .txt and read out as bytes later

I have written a small python script which encrypts a message with rsa.
Now I want to save the bytes in a txt to read them later.
But when I use str(...) on it I don't know how to convert the string back.
For example I encrypted "Test" to b'Y\xf8\xbc\xca\x14\x0f\x80\xd3\xc6\xce\xecE\x14\xc1\xaf\xbd\x82\xd24\xcf\x04\xe2\x9a\x81NF\xbeXi\x85\xef\xc4\xbbl\xd3(5\x80\xe4\xde3\x8eC\xd2jR*\xb7.gq\x8c\x8b\xa12\x1a\x10+\xbf\xefHZ\n/'
and saved it as a string.
When I aply bytes(...) on it I get the Error: TypeError: string argument without an encoding.
What can I do in order to do this?
You've saved the Python string representation of a binary byte array (bytestring).
To get the actual bytes back from such a representation, pass it through ast.literal_eval():
>>> import ast
>>> s = r"b'Y\xf8\xbc\xca\x14\x0f\x80\xd3\xc6\xce\xecE\x14\xc1\xaf\xbd\x82\xd24\xcf\x04\xe2\x9a\x81NF\xbeXi\x85\xef\xc4\xbbl\xd3(5\x80\xe4\xde3\x8eC\xd2jR*\xb7.gq\x8c\x8b\xa12\x1a\x10+\xbf\xefHZ\n/'"
>>> b = ast.literal_eval(s)
b'Y\xf8\xbc\xca\x14\x0f\x80\xd3\xc6\xce\xecE\x14\xc1\xaf\xbd\x82\xd24\xcf\x04\xe2\x9a\x81NF\xbeXi\x85\xef\xc4\xbbl\xd3(5\x80\xe4\xde3\x8eC\xd2jR*\xb7.gq\x8c\x8b\xa12\x1a\x10+\xbf\xefHZ\n/'
Better yet, just save the binary bytes to your file without passing through a string:
encrypted_bytes = my_rsa("Test")
with open("encrypted.bin", "wb") as f:
f.write(encrypted_bytes)
# ...
with open("encrypted.bin", "rb") as f:
encrypted_bytes = f.read()
If you really want a "text-safe" format for those bytes, use base64.b64encode() and base64.b64decode().

Python: Bytes not being converted properly?

I'm VERY new to binary stuff, and I'm struggling a little bit.
I'm trying to convert a binary file to text. So far, this is my code:
with open(file_path, 'rb') as f:
data = f.read()
temp_data = str(data)
if temp_data[-1] == '\\':
temp_data = temp_data[:-1]
temp_data = bytes(temp_data, 'utf-8')
text = temp_data.decode('utf-8')
It seems to be working... partially. I see some things in the long byte string that I want to see, like a file name and timestamp. However, I'm still
seeing a lot of byte values. The value of the text variable is:
b'\x00\x00\x00\x00T\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x004\x01\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00X\x01\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00x\x01\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00\x02\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00TCODEF1001.DAR_MeasLog.2019-03-05+01:10:45.2019-03-05+01:11:21.1.100.0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x95\xcc}\\\xba\xcc}\\LOG\x00\x00\x00\x00\x00\x00\x00\x00\x00OKL\x00\x04\x00\x00\x00\x01\x00\x00\x00VKL\x00\x05\x00\x00\x00\x01\x00\x00\x00YKL\x00\x06\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00h\xcc}\\\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\xa4\xcc}\\\x02\x00\x00\x00\x02\x00\x00\x00\x01\x00\x00\x00M\x00\x00\x00\x95\xcc}\\\xb9\xcc}\\'
I have no idea how to fix this, or what any of this means.
Note: I needed to parse the string for the last character '\' because the decoding was giving me an error " could not decode because last character is '\'", or something along those lines.
Thank you!
EDIT: I changed the code so now it looks like this:
with open(file_path, 'rb') as f:
data = f.read()
readable_str = data.decode('utf-16')
bytes_again = readable_str.encode('utf-16')
When I print readable_str, I'm getting non-ASCII values which should not happen at all. I get text like this:
TĴŘŸ䍔䑏䙅〱㄰䐮剁䵟慥䱳杯㈮㄰ⴹ㌰〭⬵㄰ㄺ㨰㔴㈮㄰ⴹ㌰〭⬵㄰ㄺ㨱ㄲㄮㄮ〰〮첕屽첺屽佌G䭏L䭖L䭙L챨屽첤屽M첕屽첹屽
The decoding does not work with 'utf-8' or 'utf-32'. Is there a way to tell what decoding to use based of this? Are there other encodings out there that I have not tried? Thanks!
The approach in Python3 for reading and writing data is much more explicit than what it used to be. Almost always assume bytes, decode before working with the data in the script and then encode back to bytes before writing out.
I highly recommend you watch nedbat's talk about Python's unicode and how to correctly work with bytes input/output.
Regardless, what you want to do is
with open('file.txt', 'rb') as fo:
data = fo.read() # This is in bytes
# We "decipher" the bytes into something we can work with
readable_str = data.decode('utf-8')
bytes_again = readable_str.encode('utf-8')
with open('other_file.txt', 'wb') as fw:
fw.write(bytes_again)

How to convert bytes data to string without changing data using python3

How can I convert bytes to string without changing data ?
E.g
Input:
file_data = b'\xb4\xeb7s\x14q[\xc4\xbb\x8e\xd4\xe0\x01\xec+\x8f\xf8c\xff\x00 \xeb\xff'
Output:
'\xb4\xeb7s\x14q[\xc4\xbb\x8e\xd4\xe0\x01\xec+\x8f\xf8c\xff\x00 \xeb\xff'
I want to write an image data using StringIO with some additional data, Below is my code snippet,
img_buf = StringIO()
f = open("Sample_image.jpg", "rb")
file_data = f.read()
img_buf.write('\r\n' + file_data + '\r\n')
This works fine with python 2.7 but I want it to be working with python 3.4.
on read operation file_data = f.read() returns bytes object data something like this
b'\xb4\xeb7s\x14q[\xc4\xbb\x8e\xd4\xe0\x01\xec+\x8f\xf8c\xff\x00 \xeb\xff'
While writting data using img_buf it accepts only String data, so unable to write file_data with some additional characters.
So I want to convert file_data as it is in String object without changing its data. Something like this
'\xb4\xeb7s\x14q[\xc4\xbb\x8e\xd4\xe0\x01\xec+\x8f\xf8c\xff\x00 \xeb\xff'
so that I can concat and write the image data.
I don't want to decode or encode data. Any suggestions would be helpful for me. thanks in advance.
It is not clear what kind of output you desire. If you are interested in aesthetically translating bytes to a string representation without encoding:
s = str(file_data)[1:]
print(s)
# '\xb4\xeb7s\x14q[\xc4\xbb\x8e\xd4\xe0\x01\xec+\x8f\xf8c\xff\x00 \xeb\xff'
This is the informal string representation of the original byte string (no conversion).
Details
The official string representation looks like this:
s
# "'\\xb4\\xeb7s\\x14q[\\xc4\\xbb\\x8e\\xd4\\xe0\\x01\\xec+\\x8f\\xf8c\\xff\\x00 \\xeb\\xff'"
String representation handles how a string looks. Double escape characters and double quotes are implicitly interpreted in Python to do the right thing so that the print function outputs a formatted string.
String intrepretation handles what a string means. Each block of characters means something different depending on the applied encoding. Here we interpret these blocks of characters (e.g. \\xb4, \\xeb, 7, s) with the UTF-8 encoding. Blocks unrecognized by this encoding are replaced with a default character, �:
file_data.decode("utf-8", "replace")
# '��7s\x14q[Ļ���\x01�+��c�\x00 ��'
Converting from bytes to strings is required for reliably working with strings.
In short, there is a difference in string output between how it looks (representation) and what it means (interpretation). Clarify which you prefer and proceed accordingly.
Addendum
If your question is "how do I concatenate a byte string?", here is one approach:
buffer = io.BytesIO()
with buffer as f:
f.write(b"\r\n")
f.write(file_data)
f.write(b"\r\n")
print(buffer.getvalue())
# b'\r\n\xb4\xeb7s\x14q[\xc4\xbb\x8e\xd4\xe0\x01\xec+\x8f\xf8c\xff\x00 \xeb\xff\r\n'
Equivalently:
buffer = b""
buffer += b"\r\n"
buffer += file_data
buffer += b"\r\n"
buffer
# b'\r\n\xb4\xeb7s\x14q[\xc4\xbb\x8e\xd4\xe0\x01\xec+\x8f\xf8c\xff\x00 \xeb\xff\r\n'

Checking if a byte is ascii printable

I am reading in a file using binary settings:
with open(filename, 'rb') as f:
I am then reading the entire file into a variable:
x = f.read()
My problem is that I want to check if the bytes in x are ascii printable. So i want to compare the bytes to see if they are within the range of say 32-128 in decimal notation. What would be the easiest way to go about doing this?
I have toyed around with the ord() function, various hex functions since I have previously converted the bytes into hex elsewhere in my project, but nothing seems to be working.
I'm new to python but have experience in other languages. Can anyone point me in the right direction? Thanks.
You could check each byte against string.printable.
>>> import string
>>> string.printable
'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~ \t\n\r\x0b\x0c'
printable_chars = bytes(string.printable, 'ascii')
with open(filename, 'rb') as f:
printable = all(char in printable_chars for char in f.read())
For greater efficiency, O(1) vs O(n) for the set vs string lookup, use a set:
printable_chars = set(bytes(string.printable, 'ascii'))
with open(filename, 'rb') as f:
printable = all(char in printable_chars for char in f.read())

Reading a raw RAM data in python

I took the dump of RAM data using a freeware called DumpIt(http://www.downloadcrew.com/article/23854-dumpit). The software saved the RAM data as a raw file which can be read using a Hex Editor(http://www.downloadcrew.com/article/10814-hxd).
How do I get the string data as visible in the Hex Editor(see image) in python?
For eg: I want to get the string "http://www.downloadcrew.com/article/23854-dumpit" in red box in image in python by reading the raw file generated by DumpIt.
EDIT
I tried using this code but it just gets stalled and nothing happens
#!/usr/bin/python
import binascii
filename = "LEMARC-20140401-181003.raw"
g = open("out","w")
str=""
with open(filename,"rb") as f:
for lines in f:
str+=lines
str = binascii.unhexlify(str)
f.close()
g.write(str)
g.close
In Python2
"437c2123".decode('hex')
'C|!#'
In Python3 (also works in Python2, for <2.6 you can't have the b prefixing the string)
import binascii
binascii.unhexlify(b"437c2123")
b'C|!#'
So in your case decode the entire hex string to get the ascii, and then you can extract the url with a regex or your own parsing function

Categories