Convert Python Bytes to String Without Encoding - python

I am using Python 3.6 and I have an image as bytes:
img = b'\xff\xd8\xff\xe0\x00\x10JFIF\x00'
I need to convert the bytes into a string without encoding so it looks like:
raw_img = '\xff\xd8\xff\xe0\x00\x10JFIF\x00'
The goal is to incorporate this into an html image tag:
<img src="'data:image/png;base64," + base64.b64encode(raw_img) + "' />"

Why not just call str and remove the b after?
In:
str(img)[2:-1]
Out:
'\xff\xd8\xff\xe0\x00\x10JFIF\x00'

img.decode("utf-8")
You can decode the variable with the above. Then convert it to base64.
"<img src='data:image/png;base64,{}'/>".format( base64.b64encode(img.decode("utf-8")) )
UPDATED:
What you really want is this:
raw_img = repr(img)
"<img src='data:image/png;base64,{}'/>".format( base64.b64encode(raw_img) )

Since you just need to convert the image to string why not just use str() function?
>>> img = b'\xff\xd8\xff\xe0\x00\x10JFIF\x00'
>>> type(img)
<class 'bytes'>
>>>
>>>raw_img = str(img)
>>> type(str(img))
<class 'str'>
>>>
img is in bytes, but when you use str() it is converted to type string.
An encoding can also be specified https://docs.python.org/3/library/stdtypes.html#str, which would be a more natural way to do things:
str(img, encoding='ansi')
As suggested in these answers

I didn't solve this but here's some research on it(3Feb2022): This encoding is latin (or latin-1) and it's hard to print because Python wants to print it in another format. But for your case they should be the same. And for a data:image/png;base64 base64 code should be used.
My test code:
import codecs
img = b"\xff\xd8\xff\xe0\x00\x10JFIF\x00"
desired = "\xff\xd8\xff\xe0\x00\x10JFIF\x00"
str_decode = img.decode("latin-1")
str_decode_2 = str(img, "latin-1")
codecs_decode = codecs.decode(img, "latin-1")
print(desired.encode("latin-1") == img)
print(str_decode == desired)
print(str_decode == str_decode_2)
print(str_decode == codecs_decode)
print("desired:", repr(desired)) ##devprint
This gives 4 True and a desired: ÿØÿà\x00\x10JFIF\x00 with Python 3.10.

I'm pretty sure img is the byte string that you want to pass to base64.b64encode:
>>> import base64
>>> img = b'\xff\xd8\xff\xe0\x00\x10JFIF\x00'
>>> base64.b64encode(img)
b'/9j/4AAQSkZJRgA='
If you want to incorporate that into an HTML string, use
html = b'<img src="data:image/png;base64,' + base64.b64encode(img) + b' />'

I've solved it (2022 - bit late to the party...)
If you try img_raw.decode() you get the
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte error
But if you leave img_raw as a binary string and pass it into b64encode and then decode it, it doesn't have the UnicodeDecodeError, and you can pass it in as a data string to your image tag.
base64.b64encode(raw_image).decode()

Related

Converting RSA signature to String

I'm creating my RSA Signature like this.
transactionStr = json.dumps(GenesisTransaction())
signature = rsa.sign(transactionStr.encode(), client.privateKey, 'SHA-1')
But I'm unable to get it to a string so I can save it.
I have tried decoding it using utf8
signature.decode("utf8")
but I get the error "'utf-8' codec can't decode byte 0xe3 in position 2"
any way I can do this?
A RSA signature looks like this
b'aL\xe3\xf4\xbeEM\xc4\x9e\n\x9e\xf4M`\xba\x85*\x13\xd52x\xd9\\\xe8F\x1c\x07\x90[/\x9dy\xce\xa9IV\x89\xe0\xcd9\\_3\x1e\xaa\x80\xdea\xd1\xbem/\x8e\x91\xbd\x13\x12o\x8c\xed\xf6\x89\xb5\x0b'
.decode('utf8') is used to decode text encoded in UTF8, not arbitrary bytes. Convert the byte string to a hexadecimal string instead:
>>> sig = b'aL\xe3\xf4\xbeEM\xc4\x9e\n\x9e\xf4M`\xba\x85*\x13\xd52x\xd9\\\xe8F\x1c\x07\x90[/\x9dy\xce\xa9IV\x89\xe0\xcd9\\_3\x1e\xaa\x80\xdea\xd1\xbem/\x8e\x91\xbd\x13\x12o\x8c\xed\xf6\x89\xb5\x0b'
>>> s = sig.hex()
>>> s
'614ce3f4be454dc49e0a9ef44d60ba852a13d53278d95ce8461c07905b2f9d79cea9495689e0cd395c5f331eaa80de61d1be6d2f8e91bd13126f8cedf689b50b'
To convert back, if needed:
>>> b = bytes.fromhex(s)
>>> b
b'aL\xe3\xf4\xbeEM\xc4\x9e\n\x9e\xf4M`\xba\x85*\x13\xd52x\xd9\\\xe8F\x1c\x07\x90[/\x9dy\xce\xa9IV\x89\xe0\xcd9\\_3\x1e\xaa\x80\xdea\xd1\xbem/\x8e\x91\xbd\x13\x12o\x8c\xed\xf6\x89\xb5\x0b'
>>> b==sig
True

How to resolve AssertionError for converting string characters to bytes using run length encoding?

I have a problem to solve but once I submit my solution the result shows an AssertionError.
I want to be able to convert my string of characters into a byte format using a technique called run-length encoding - https://en.wikipedia.org/wiki/Run-length_encoding
I was doing some research on it and found that what I felt done by me was correct but the solution was apparently not correct on the website and I get this: AssertionError: 'a5b6c4' is not an instance of <class 'bytes'> : compress('') should return bytes
My code:
from collections import OrderedDict
def compress(raw=str)->bytes:
dict=OrderedDict.fromkeys(my_str_as_bytes, 0)
for ch in my_str_as_bytes:
dict[ch] += 1
output = ''
for key,value in dict.items():
output = output + key + str(value)
return output
my_str_as_bytes = "aaaaabbbbbbcccc"
print (bytes(compress(my_str_as_bytes,),encoding='UTF-8'))
The result on my IDE was:
b'a5b6c4'
I'm not sure of what I did is encoding the string and changing it into a byte or not. Any help would be appreciated.
Few of the links I checked:
https://www.tutorialspoint.com/run-length-encoding-in-python
https://www.geeksforgeeks.org/run-length-encoding-python/

How to decode a string representation of a bytes object?

I have a string which includes encoded bytes inside it:
str1 = "b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'"
I want to decode it, but I can't since it has become a string. Therefore I want to ask whether there is any way I can convert it into
str2 = b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'
Here str2 is a bytes object which I can decode easily using
str2.decode('utf-8')
to get the final result:
'Output file 문항분석.xlsx Created'
You could use ast.literal_eval:
>>> print(str1)
b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'
>>> type(str1)
<class 'str'>
>>> from ast import literal_eval
>>> literal_eval(str1).decode('utf-8')
'Output file 문항분석.xlsx Created'
Based on the SyntaxError mentioned in your comments, you may be having a testing issue when attempting to print due to the fact that stdout is set to ascii in your console (and you may also find that your console does not support some of the characters you may be trying to print). You can try something like the following to set sys.stdout to utf-8 and see what your console will print (just using string slice and encode below to get bytes rather than the ast.literal_eval approach that has already been suggested):
import codecs
import sys
sys.stdout = codecs.getwriter('utf-8')(sys.stdout.buffer)
s = "b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'"
b = s[2:-1].encode().decode('utf-8')
A simple way is to assume that all the characters of the initial strings are in the [0,256) range and map to the same Unicode value, which means that it is a Latin1 encoded string.
The conversion is then trivial:
str1[2:-1].encode('Latin1').decode('utf8')
Finally I have found an answer where i use a function to cast a string to bytes without encoding.Given string
str1 = "b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'"
now i take only actual encoded text inside of it
str1[2:-1]
and pass this to the function which convert the string to bytes without encoding its values
import struct
def rawbytes(s):
"""Convert a string to raw bytes without encoding"""
outlist = []
for cp in s:
num = ord(cp)
if num < 255:
outlist.append(struct.pack('B', num))
elif num < 65535:
outlist.append(struct.pack('>H', num))
else:
b = (num & 0xFF0000) >> 16
H = num & 0xFFFF
outlist.append(struct.pack('>bH', b, H))
return b''.join(outlist)
So, calling the function would convert it to bytes which then is decoded
rawbytes(str1[2:-1]).decode('utf-8')
will give the correct output
'Output file 문항분석.xlsx Created'

How to decode base64 in python3

I have a base64 encrypt code, and I can't decode in python3.5
import base64
code = "YWRtaW46MjAyY2I5NjJhYzU5MDc1Yjk2NGIwNzE1MmQyMzRiNzA" # Unencrypt is 202cb962ac59075b964b07152d234b70
base64.b64decode(code)
Result:
binascii.Error: Incorrect padding
But same website(base64decode) can decode it,
Please anybody can tell me why, and how to use python3.5 decode it?
Thanks
Base64 needs a string with length multiple of 4. If the string is short, it is padded with 1 to 3 =.
import base64
code = "YWRtaW46MjAyY2I5NjJhYzU5MDc1Yjk2NGIwNzE1MmQyMzRiNzA="
base64.b64decode(code)
# b'admin:202cb962ac59075b964b07152d234b70'
According to this answer, you can just add the required padding.
code = "YWRtaW46MjAyY2I5NjJhYzU5MDc1Yjk2NGIwNzE1MmQyMzRiNzA"
b64_string = code
b64_string += "=" * ((4 - len(b64_string) % 4) % 4)
base64.b64decode(b64_string) #'admin:202cb962ac59075b964b07152d234b70'
I tried the other way around. If you know what the unencrypted value is:
>>> import base64
>>> unencoded = b'202cb962ac59075b964b07152d234b70'
>>> encoded = base64.b64encode(unencoded)
>>> print(encoded)
b'MjAyY2I5NjJhYzU5MDc1Yjk2NGIwNzE1MmQyMzRiNzA='
>>> decoded = base64.b64decode(encoded)
>>> print(decoded)
b'202cb962ac59075b964b07152d234b70'
Now you see the correct padding. b'MjAyY2I5NjJhYzU5MDc1Yjk2NGIwNzE1MmQyMzRiNzA=
It actually seems to just be that code is incorrectly padded (code is incomplete)
import base64
code = "YWRtaW46MjAyY2I5NjJhYzU5MDc1Yjk2NGIwNzE1MmQyMzRiNzA"
base64.b64decode(code+"=")
returns b'admin:202cb962ac59075b964b07152d234b70'

How could list decode to 'UTF-8'

I got a list = [0x97, 0x52], not unicode object. this is unicode of a charactor '青'(u'\u9752'). How could I change this list to unicode object first, then encode to 'UTF-8'?
bytes = [0x97, 0x52]
code = bytes[0] * 256 + bytes[1] # build the 16-bit code
char = unichr(code) # convert code to unicode
utf8 = char.encode('utf-8') # encode unicode as utf-8
print utf8 # prints '青'
Not sure if this is the most elegant way, but it works for this particular example.
>>> ''.join([chr(x) for x in [0x97, 0x52]]).decode('utf-16be')
u'\u9752'

Categories