Converting RSA signature to String - python

I'm creating my RSA Signature like this.
transactionStr = json.dumps(GenesisTransaction())
signature = rsa.sign(transactionStr.encode(), client.privateKey, 'SHA-1')
But I'm unable to get it to a string so I can save it.
I have tried decoding it using utf8
signature.decode("utf8")
but I get the error "'utf-8' codec can't decode byte 0xe3 in position 2"
any way I can do this?
A RSA signature looks like this
b'aL\xe3\xf4\xbeEM\xc4\x9e\n\x9e\xf4M`\xba\x85*\x13\xd52x\xd9\\\xe8F\x1c\x07\x90[/\x9dy\xce\xa9IV\x89\xe0\xcd9\\_3\x1e\xaa\x80\xdea\xd1\xbem/\x8e\x91\xbd\x13\x12o\x8c\xed\xf6\x89\xb5\x0b'

.decode('utf8') is used to decode text encoded in UTF8, not arbitrary bytes. Convert the byte string to a hexadecimal string instead:
>>> sig = b'aL\xe3\xf4\xbeEM\xc4\x9e\n\x9e\xf4M`\xba\x85*\x13\xd52x\xd9\\\xe8F\x1c\x07\x90[/\x9dy\xce\xa9IV\x89\xe0\xcd9\\_3\x1e\xaa\x80\xdea\xd1\xbem/\x8e\x91\xbd\x13\x12o\x8c\xed\xf6\x89\xb5\x0b'
>>> s = sig.hex()
>>> s
'614ce3f4be454dc49e0a9ef44d60ba852a13d53278d95ce8461c07905b2f9d79cea9495689e0cd395c5f331eaa80de61d1be6d2f8e91bd13126f8cedf689b50b'
To convert back, if needed:
>>> b = bytes.fromhex(s)
>>> b
b'aL\xe3\xf4\xbeEM\xc4\x9e\n\x9e\xf4M`\xba\x85*\x13\xd52x\xd9\\\xe8F\x1c\x07\x90[/\x9dy\xce\xa9IV\x89\xe0\xcd9\\_3\x1e\xaa\x80\xdea\xd1\xbem/\x8e\x91\xbd\x13\x12o\x8c\xed\xf6\x89\xb5\x0b'
>>> b==sig
True

Related

How to permissively decode a UTF-8 bytearray?

I need to decode a UTF-8 sequence, which is stored in a bytearray, to a string.
The UTF-8 sequence might contain erroneous parts. In this case I need to decode as much as possible and (optionally?) substitute invalid parts by something like "?".
# First part decodes to "ABÄC"
b = bytearray([0x41, 0x42, 0xC3, 0x84, 0x43])
s = str(b, "utf-8")
print(s)
# Second part, invalid sequence, wanted to decode to something like "AB?C"
b = bytearray([0x41, 0x42, 0xC3, 0x43])
s = str(b, "utf-8")
print(s)
What's the best way to achieve this in Python 3?
There are several builtin error-handling schemes for encoding and decoding str to and from bytes and bytearray with e.g. bytearray.decode(). For example:
>>> b = bytearray([0x41, 0x42, 0xC3, 0x43])
>>> b.decode('utf8', errors='ignore') # discard malformed bytes
'ABC'
>>> b.decode('utf8', errors='replace') # replace with U+FFFD
'AB�C'
>>> b.decode('utf8', errors='backslashreplace') # replace with backslash-escape
'AB\\xc3C'
In addition, you can write your own error handler and register it:
import codecs
def my_handler(exception):
"""Replace unexpected bytes with '?'."""
return '?', exception.end
codecs.register_error('my_handler', my_handler)
>>> b.decode('utf8', errors='my_handler')
'AB?C'
All of these error handling schemes can also be used with the str() constructor as in your question:
>>> str(b, 'utf8', errors='my_handler')
'AB?C'
... although it's more idiomatic to use str.decode() explicitly.

python 3 prints bytestring and says string is of type str?

The code I have is from an single-sign-on function
from urllib.parse import unquote
import base64
payload = unquote(payload)
print(payload)
print(type(payload))
decoded = base64.decodestring(payload)
decodestring is complaining that I gave it a string instead of bytes...
File "/Users/Jeff/Development/langalang/proj/discourse/views.py", line 38, in sso
decoded = base64.decodestring(payload)
File "/Users/Jeff/.virtualenvs/proj/lib/python3.6/base64.py", line 559, in decodestring
return decodebytes(s)
File "/Users/Jeff/.virtualenvs/proj/lib/python3.6/base64.py", line 551, in decodebytes
_input_type_check(s)
File "/Users/Jeff/.virtualenvs/proj/lib/python3.6/base64.py", line 520, in _input_type_check
raise TypeError(msg) from err
TypeError: expected bytes-like object, not str
which is fine but when I look at what my print statements printed to the terminal I see this...
b'bm9uY2U9NDI5NDg5OTU0NjU4MjAzODkyNTI=\n'
<class 'str'>
it seems to be saying it is a string of bytes, but then it says that it is a string.
What is going on here?
if I add a encode() to the end of the payload declaration I see this...
payload = unquote(payload).encode()
b"b'bm9uY2U9NDQxMTQ4MzIyNDMwNjU3MjcyMDM=\\n'"
<class 'bytes'>
EDIT: adding the method that makes the payload
#patch("discourse.views.HttpResponseRedirect")
def test_sso_success(self, mock_redirect):
"""Test for the sso view"""
# Generating a random number, encoding for url, signing it with a hash
nonce = "".join([str(random.randint(0, 9)) for i in range(20)])
# The sso payload needs to be a dict of params
params = {"nonce": nonce}
payload = base64.encodestring(urlencode(params).encode())
print(payload.decode() + " tests")
key = settings.SSO_SECRET
h = hmac.new(key.encode(), payload, digestmod=hashlib.sha256)
signature = h.hexdigest()
url = reverse("discourse:sso") + "?sso=%s&sig=%s" % (payload, signature)
req = self.rf.get(url)
req.user = self.user
response = sso(req)
self.assertTrue(mock_redirect.called)
As you payload is generate by this base64.encodestring(s) which is by documentation is:
Encode the bytes-like object s, which can contain arbitrary binary
data, and return bytes containing the base64-encoded data, with
newlines (b'\n') inserted after every 76 bytes of output, and ensuring
that there is a trailing newline, as per RFC 2045 (MIME).
Then you do urllib.parse.unquote to a byte sequence that consists of ASCII chars. At that moment you got a prefix of b' to your string as unquote runs str constructor over payload bytearray. As a request you get a str instead of bytes , which is moreover a not valid base64 encoded.
it seems to be saying it is a string of bytes, but then it says that it is a string.
Looks like you have here string looks like: "b'bm9uY2U9NDQxMTQ4MzIyNDMwNjU3MjcyMDM=\\n'" so leading b is not byte literal it is just part of string's value.
So you need to rid off this symbols before pass it to base64 decoder:
from urllib.parse import unquote, quote_from_bytes
import base64
payload = unquote(payload)
print(payload[2:-1])
enc = base64.decodebytes(payload[2:-1].encode())
print(enc)
The original error allowed to think that, and the display of the encoded string confirms it: your payload string is a unicode string, which happens to begin with the prefix "b'" and ends with a single "'".
Such a string is generally built with a repr call:
>>> b = b'abc' # b is a byte string
>>> r = repr(b) # by construction r is a unicode string
>>> print(r) # will look like a byte string
b'abc'
>>> print(b) # what is printed for a true byte string
abc
You can revert to a true byte string with literal_eval:
>>> b2 = ast.literal_eval(r)
>>> type(b2)
<class 'bytes'>
>>> b == b2
True
But the revert is only a workaround and you should track in your code where you build a representation of a byte string.

Python: Convert utf-8 string to byte string [duplicate]

This question already has answers here:
Best way to convert string to bytes in Python 3?
(5 answers)
Closed 11 days ago.
I have the following function to parse a utf-8 string from a sequence of bytes
Note -- 'length_size' is the number of bytes it take to represent the length of the utf-8 string
def parse_utf8(self, bytes, length_size):
length = bytes2int(bytes[0:length_size])
value = ''.join(['%c' % b for b in bytes[length_size:length_size+length]])
return value
def bytes2int(raw_bytes, signed=False):
"""
Convert a string of bytes to an integer (assumes little-endian byte order)
"""
if len(raw_bytes) == 0:
return None
fmt = {1:'B', 2:'H', 4:'I', 8:'Q'}[len(raw_bytes)]
if signed:
fmt = fmt.lower()
return struct.unpack('<'+fmt, raw_bytes)[0]
I'd like to write the function in reverse -- i.e. a function that will take a utf-8 encoded string and return it's representation as a byte string.
So far, I have the following:
def create_utf8(self, utf8_string):
return utf8_string.encode('utf-8')
I run into the following error when attempting to test it:
File "writer.py", line 229, in create_utf8
return utf8_string.encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0x98 in position 0: ordinal not in range(128)
If possible, I'd like to adopt a structure for the code similar to the parse_utf8 example. What am I doing wrong?
Thank you for your help!
UPDATE: test driver, now correct
def random_utf8_seq(self, length):
# from http://www.w3.org/2001/06/utf-8-test/postscript-utf-8.html
test_charset = u" !\"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬­ ®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĂ㥹ĆćČčĎďĐđĘęĚěĹ弾ŁłŃńŇňŐőŒœŔŕŘřŚśŞşŠšŢţŤťŮůŰűŸŹźŻżŽžƒˆˇ˘˙˛˜˝–—‘’‚“”„†‡•…‰‹›€™"
utf8_seq = u""
for i in range(length):
utf8_seq += random.choice(test_charset)
return utf8_seq
I get the following error:
input_str = self.random_utf8_seq(200)
File "writer.py", line 226, in random_utf8_seq
print unicode(utf8_seq, "utf-8")
UnicodeDecodeError: 'utf8' codec can't decode byte 0xbb in position 0: invalid start byte
If utf-8 => bytestring conversion is what do you want then you may use str.encode, but first you need to properly mark the type of source string in your example - prefix with u for unicode:
# coding: utf-8
import random
def random_utf8_seq(length):
# from http://www.w3.org/2001/06/utf-8-test/postscript-utf-8.html
test_charset = u" !\"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬­ ®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĂ㥹ĆćČčĎďĐđĘęĚěĹ弾ŁłŃńŇňŐőŒœŔŕŘřŚśŞşŠšŢţŤťŮůŰűŸŹźŻżŽžƒˆˇ˘˙˛˜˝–—‘’‚“”„†‡•…‰‹›€™"
utf8_seq = u''
for i in range(length):
utf8_seq += random.choice(test_charset)
print utf8_seq.encode('utf-8')
return utf8_seq.encode('utf-8')
print( type(random_utf8_seq(200)) )
-- output --
­
õ3×sÔP{Ć.s(Ë°˙ě÷xÓ#bűV—û´ő¢uZÓČn˜0|_"Ðyø`êš·ÏÝhunÍÅ=ä?
óP{tlÇűpb¸7s´ňƒG—čøň\zčłŢXÂYqLĆúěă(ÿî ¥PyÐÔŇnל¦Ì˝+•ì›
ŻÛ°Ñ^ÝC÷ŢŐIñJĹţÒył­"MťÆ‹ČČ4þ!»šåŮ#Öhň-
ÈLGĄ¢ß˛Đ¯.ªÆź˘Ř^ĽÛŹËaĂŕ¹#¢éüÜńlÊqš=VřU…‚–MŽÎÉèoÙŹŠ¨Ð
<type 'str'>

How could list decode to 'UTF-8'

I got a list = [0x97, 0x52], not unicode object. this is unicode of a charactor '青'(u'\u9752'). How could I change this list to unicode object first, then encode to 'UTF-8'?
bytes = [0x97, 0x52]
code = bytes[0] * 256 + bytes[1] # build the 16-bit code
char = unichr(code) # convert code to unicode
utf8 = char.encode('utf-8') # encode unicode as utf-8
print utf8 # prints '青'
Not sure if this is the most elegant way, but it works for this particular example.
>>> ''.join([chr(x) for x in [0x97, 0x52]]).decode('utf-16be')
u'\u9752'

Fixed-digit base64 encode and decode in Python

I'm trying to encode and decode a base64 string. It works fine normally, but if I try to restrict the hash to 6 digits, I get an error on decoding:
from base64 import b64encode
from base64 import b64decode
s="something"
base 64 encode/decode:
# Encode:
hash = b64encode(s)
# Decode:
dehash = b64decode(hash)
print dehash
(works)
6-digit base 64 encode/decode:
# Encode:
hash = b64encode(s)[:6]
# Decode:
dehash = b64decode(hash)
print dehash
TypeError: Incorrect padding
What am I doing wrong?
UPDATE:
Based on Mark's answer, I added padding to the 6-digit hash to make it divisible by 4:
hash = hash += "=="
But now the decode result = "some"
UPDATE 2
Wow that was stupid ..
Base64 by definition requires padding on the input if it does not decode into an integral number of bytes on the output. Every 4 base64 characters gets turned into 3 bytes. Your input length does not divide evenly by 4, thus there's an error.
Wikipedia has a good description of the specifics of Base64.

Categories