Django urlsafe base64 decoding with decryption - python

I'm writing my own captcha system for user registration. So I need to create a suitable URL for receiving generated captcha pictures. Generation looks like this:
_cipher = cipher.new(settings.CAPTCHA_SECRET_KEY, cipher.MODE_ECB)
_encrypt_block = lambda block: _cipher.encrypt(block + ' ' * (_cipher.block_size - len(block) % _cipher.block_size))
#...
a = (self.rightnum, self.animal_type[1])
serialized = pickle.dumps(a)
encrypted = _encrypt_block(serialized)
safe_url = urlsafe_b64encode(encrypted)
But then I'm trying to receive this key via GET request in the view function, it fails on urlsafe_b64decode() with "character mapping must return integer, None or unicode" error:
def captcha(request):
try:
key = request.REQUEST['key']
decoded = urlsafe_b64decode(key)
decrypted = _decrypt_block(decoded)
deserialized = pickle.loads(decrypted)
return HttpResponse(deserialized)
except KeyError:
return HttpResponseBadRequest()
I found that on the output of urlsafe_b64encode there is an str, but GET request returns a unicode object (nevertheless it's a right string). Str() didn't help (it returns decode error deep inside django), and if I use key.repr it works, but decryptor doesn't work with an error "Input strings must be a multiple of 16 in length".
Inside a test file all this construction works perfectly, I can't understand, what's wrong?

The problem is that b64decode quite explicitly can only take bytes (a string), not unicode.
>>> import base64
>>> test = "Hi, I'm a string"
>>> enc = base64.urlsafe_b64encode(test)
>>> enc
'SGksIEknbSBhIHN0cmluZw=='
>>> uenc = unicode(enc)
>>> base64.urlsafe_b64decode(enc)
"Hi, I'm a string"
>>> base64.urlsafe_b64decode(uenc)
Traceback (most recent call last):
...
TypeError: character mapping must return integer, None or unicode
Since you know that your data only contains ASCII data (that's what base64encode will return), it should be safe to encode your unicode code points as ASCII or UTF-8 bytes, those bytes will be equivalent to the ASCII you expected.
>>> base64.urlsafe_b64decode(uenc.encode("ascii"))
"Hi, I'm a string"

I solved the problem!
deserialized = pickle.loads(captcha_decrypt(urlsafe_b64decode(key.encode('ascii'))))
return HttpResponse(str(deserialized))
But still I don't understand, why it didn't work first time.

Related

Base64 decoding and encoding give different results

I have the two following encoded string :
base64_str1 = 'eyJzZWN0aW9uX29mZnNldCI6MiwiaXRlbXNfb2Zmc2V0IjozNiwidmVyc2lvbiI6MX0%3D'
base64_str2 = 'eyJzZWN0aW9uX29mZnNldCI6MCwiaXRlbXNfb2Zmc2V0IjowLCJ2ZXJzaW9uIjoxfQ%3D%3D'
Using Base64 online decoder/encoder , the results are as follow (which are the right results) :
base64_str1_decoded = '{"section_offset":2,"items_offset":36,"version":1}7'
base64_str2_decoded = '{"section_offset":0,"items_offset":0,"version":1}'
However, when I tried to encode base64_str1_decoded or base64_str2_decoded back to Base64, I'm not able to obtain the initial base64 strings.
For instance, the ouput for the following code :
base64_str2_decoded = '{"section_offset":0,"items_offset":0,"version":1}'
recoded_str2 = base64.b64encode(bytes(base64_str2_decoded, 'utf-8'))
print(recoded_str2)
# output = b'eyJzZWN0aW9uX29mZnNldCI6MCwiaXRlbXNfb2Zmc2V0IjowLCJ2ZXJzaW9uIjoxfQ=='
# expected_output = eyJzZWN0aW9uX29mZnNldCI6MCwiaXRlbXNfb2Zmc2V0IjowLCJ2ZXJzaW9uIjoxfQ%3D%3D
I tried changing the encoding scheme but can't seem to make it work.
Notice that extra 7 at the end of base64_str1_decoded? That's because your input strings are incorrect. They have escape codes required for URLs. %3D is an escape code for =, which is what should be entered into the online decoder instead. You'll notice the 2nd string in the decoder has an extra ÃÜ on the next line you haven't shown due to using %3D%3D instead of ==. That online decoder is allowing invalid base64 to be decoded.
To correctly decode in Python use urllib.parse.unquote on the string to remove the escaping first:
import base64
import urllib.parse
base64_str1 = 'eyJzZWN0aW9uX29mZnNldCI6MiwiaXRlbXNfb2Zmc2V0IjozNiwidmVyc2lvbiI6MX0%3D'
base64_str2 = 'eyJzZWN0aW9uX29mZnNldCI6MCwiaXRlbXNfb2Zmc2V0IjowLCJ2ZXJzaW9uIjoxfQ%3D%3D'
# Demonstrate Python decoder detects invalid B64 encoding
try:
print(base64.b64decode(base64_str1))
except Exception as e:
print('Exception:', e)
try:
print(base64.b64decode(base64_str2))
except Exception as e:
print('Exception:', e)
# Decode after unquoting...
base64_str1_decoded = base64.b64decode(urllib.parse.unquote(base64_str1))
base64_str2_decoded = base64.b64decode(urllib.parse.unquote(base64_str2))
print(base64_str1_decoded)
print(base64_str2_decoded)
# See valid B64 encoding.
recoded_str1 = base64.b64encode(base64_str1_decoded)
recoded_str2 = base64.b64encode(base64_str2_decoded)
print(recoded_str1)
print(recoded_str2)
Output:
Exception: Invalid base64-encoded string: number of data characters (69) cannot be 1 more than a multiple of 4
Exception: Incorrect padding
b'{"section_offset":2,"items_offset":36,"version":1}'
b'{"section_offset":0,"items_offset":0,"version":1}'
b'eyJzZWN0aW9uX29mZnNldCI6MiwiaXRlbXNfb2Zmc2V0IjozNiwidmVyc2lvbiI6MX0='
b'eyJzZWN0aW9uX29mZnNldCI6MCwiaXRlbXNfb2Zmc2V0IjowLCJ2ZXJzaW9uIjoxfQ=='
Note that the b'' notation is Python's indication that the object is a byte string as opposed to a Unicode string and is not part of the string itself.

ValueError: Ciphertext length must be equal to key size

I know this question was asked a lot, but I still don't really understand how could I debug my code. I was googling this question for over 3 days now, but I just can't find a suitable answer for my code. I want to add the encrypted message with an input but every time I run my code it just gives the
Traceback (most recent call last):
File "decryption.py", line 27, in <module>
label=None
File "/opt/anaconda3/lib/python3.7/site-packages/cryptography/hazmat/backends/openssl/rsa.py", line 357, in decrypt
raise ValueError("Ciphertext length must be equal to key size.")
ValueError: Ciphertext length must be equal to key size.
error. I know its with the label, but I just can't find the answer
Here is my code for the decryption:
from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives import serialization
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.asymmetric import padding
with open("private_key.pem", "rb") as key_file:
private_key = serialization.load_pem_private_key(
key_file.read(),
password=None,
backend=default_backend()
)
with open("public_key.pem", "rb") as key_file:
public_key = serialization.load_pem_public_key(
key_file.read(),
backend=default_backend()
)
textInput = str(input(">>> "))
encrypted = textInput.encode()
original_message = private_key.decrypt(
encrypted,
padding.OAEP(
mgf=padding.MGF1(algorithm=hashes.SHA256()),
algorithm=hashes.SHA256(),
label=None
)
)
original_decoded_message = original_message.decode("utf-8")
print(original_decoded_message)
And for the encryption:
from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives.asymmetric import rsa
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.asymmetric import padding
from cryptography.hazmat.primitives import serialization
private_key = rsa.generate_private_key(
public_exponent=65537,
key_size=2048,
backend=default_backend()
)
public_key = private_key.public_key()
pem = private_key.private_bytes(
encoding=serialization.Encoding.PEM,
format=serialization.PrivateFormat.PKCS8,
encryption_algorithm=serialization.NoEncryption()
)
with open('private_key.pem', 'wb') as f:
f.write(pem)
pem = public_key.public_bytes(
encoding=serialization.Encoding.PEM,
format=serialization.PublicFormat.SubjectPublicKeyInfo
)
with open('public_key.pem', 'wb') as f:
f.write(pem)
raw_message = str(input(">>> "))
message = raw_message.encode("utf-8")
encrypted = public_key.encrypt(
message,
padding.OAEP(
mgf=padding.MGF1(algorithm=hashes.SHA256()),
algorithm=hashes.SHA256(),
label=None
)
)
print(encrypted)
Im really new to Asymmetric encryption / decryption.
This minor modification in the decryption part of your code made it work for me:
textInput = str(input(">>> "))
# encrypted = textInput.encode() #this is incorrect!
encrypted = eval(textInput)
This is what I get:
For the encryption part:
>>> test message
b'>xB)\xf1\xc5I\xd6\xce\xfb\xcf\x83\xe2\xc5\x8f\xcfl\xb9\x0f\xa2\x13\xa5\xe1\x03\xf7p\xb3\x9c\xeb\r\xc1"\xf2\x17\x8b\xea\t\xed\xb2xG\xb7\r\xa9\xf8\x03eBD\xdd9>\xbe\xd1O\xe2\x9f\xbb\xf9\xff5\x96l\xea\x17FI\x8d\x02\x05\xea\x1dpM\xbb\x04J\xfc\x0c\\\xfe\x15\x07\xaf \x9e\xc2\xf9M\xa4\x1d$\xc3\x99my\xb6\xc5\xad\x97\xd06\xd2\x08\xd3\xe2\xc8H\xca\xd8\xfd{\xe6\xc6\xa3\x18\xeb\xe6\xcc\xc5\x9a\xc8*\xbb\xc1\x8c\x80,\x1f\r#\x9b\x9d\xc5\x91I\xa8\xc01y\xbc\xa73\xd3\x19;\xef\x8a\xfb\xc2\xc4\x9e\xbe\x8f\xeb\x1d\x12\xbd\xe4<\xa0\xbb\x8d\xef\xee\xa3\x89E\x07"m\x1d\xb0\xf3\xd2:y\xd9\xbd\xef\xdf\xc9\xbb\x1b\xd5\x03\x91\xa4l\x8bS\x9e\x80\x14\x90\x18\xc4\x9e\\?\x8eF\x05\xa1H\x9e:\x0c\x96\x8e\xb3E3\x90\xa2\xa1\xd9\x88\xa0<X\x7f\rIP\x00\xbf\xf6\x15\xfb9tW\x17\x9f\xca\x95\xf6|\xd7\x90\xbcp\xe5\xb5,V\x1b\xe9\x90\xf6\x87 v=6'
Now for the decryption part, I use the output:
>>> b'>xB)\xf1\xc5I\xd6\xce\xfb\xcf\x83\xe2\xc5\x8f\xcfl\xb9\x0f\xa2\x13\xa5\xe1\x03\xf7p\xb3\x9c\xeb\r\xc1"\xf2\x17\x8b\xea\t\xed\xb2xG\xb7\r\xa9\xf8\x03eBD\xdd9>\xbe\xd1O\xe2\x9f\xbb\xf9\xff5\x96l\xea\x17FI\x8d\x02\x05\xea\x1dpM\xbb\x04J\xfc\x0c\\\xfe\x15\x07\xaf \x9e\xc2\xf9M\xa4\x1d$\xc3\x99my\xb6\xc5\xad\x97\xd06\xd2\x08\xd3\xe2\xc8H\xca\xd8\xfd{\xe6\xc6\xa3\x18\xeb\xe6\xcc\xc5\x9a\xc8*\xbb\xc1\x8c\x80,\x1f\r#\x9b\x9d\xc5\x91I\xa8\xc01y\xbc\xa73\xd3\x19;\xef\x8a\xfb\xc2\xc4\x9e\xbe\x8f\xeb\x1d\x12\xbd\xe4<\xa0\xbb\x8d\xef\xee\xa3\x89E\x07"m\x1d\xb0\xf3\xd2:y\xd9\xbd\xef\xdf\xc9\xbb\x1b\xd5\x03\x91\xa4l\x8bS\x9e\x80\x14\x90\x18\xc4\x9e\\?\x8eF\x05\xa1H\x9e:\x0c\x96\x8e\xb3E3\x90\xa2\xa1\xd9\x88\xa0<X\x7f\rIP\x00\xbf\xf6\x15\xfb9tW\x17\x9f\xca\x95\xf6|\xd7\x90\xbcp\xe5\xb5,V\x1b\xe9\x90\xf6\x87 v=6'
test message
The problem with your original code is that you are writing the representation of a bytes string as a unicode string (when using print(encrypted)), then encoding it into a bytes object in the decrypt code, and then passing it to the decrypt function. Encoding this string will not yield the original bytes string encrypted.
This example illustrates the problem:
>>> x = bytes(bytearray.fromhex('f3'))
>>> x #A bytes string, similar to encrypt
b'\xf3'
>>> print(x)
b'\xf3'
>>> len(x)
1
>>> str(x) #this is what print(x) writes to stdout
"b'\\xf3'"
>>> print(str(x))
b'\xf3'
>>> len(str(x)) #Note that the lengths are different!
7
>>> x == str(x)
False
>>> str(x).encode() #what you were trying to do
b"b'\\xf3'"
>>> x == str(x).encode()
False
>>> eval(str(x)) #what gets the desired result
b'\xf3'
>>> x == eval(str(x))
True
The thing is that the print function prints the representation of the object rather than the object itself. Essentially, it does this by getting a printable value of inherently unprintable objects by using __repr__ or __str__ methods of that object.
This is the documentation for __str__:
Called by str(object) and the built-in functions format() and print()
to compute the “informal” or nicely printable string representation of
an object. The return value must be a string object.
.
.
The default
implementation defined by the built-in type object calls
object.__repr__().
and __repr__:
Called by the repr() built-in function to compute the “official”
string representation of an object. If at all possible, this should
look like a valid Python expression that could be used to recreate an
object
This "official" representation returned by __repr__ is what allowed me to use the eval function in the decrypt code to solve the problem.
TL;DR:
copying the output of print(encrypted) copies the informal, human readable value returned by encrypted.__str__() or the "official" representation returned by encrypted.__repr__(), which are both in a human readable encoding such as utf-8. They cannot be encoded back(by using utf-8 encoding) to create the original bytes string that they represent.
It is also worth looking into this question from a stackoverflow user, facing the same issue. The answer gives a way to actually encode this representation back into a bytes object if you want. It's worth looking into this because eval should be avoided, it is very unsafe. For that method to work, you would have to remove the leading b' and trailing ' from the string textInput before encoding it.
My final recommendation would be to pass the entire object encrypted between the two programs by either using files with the pickle module, or some form of socket programming. Best to stay away from using print and input to pass data like this between programs to avoid encoding issues.

How to decode a string representation of a bytes object?

I have a string which includes encoded bytes inside it:
str1 = "b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'"
I want to decode it, but I can't since it has become a string. Therefore I want to ask whether there is any way I can convert it into
str2 = b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'
Here str2 is a bytes object which I can decode easily using
str2.decode('utf-8')
to get the final result:
'Output file 문항분석.xlsx Created'
You could use ast.literal_eval:
>>> print(str1)
b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'
>>> type(str1)
<class 'str'>
>>> from ast import literal_eval
>>> literal_eval(str1).decode('utf-8')
'Output file 문항분석.xlsx Created'
Based on the SyntaxError mentioned in your comments, you may be having a testing issue when attempting to print due to the fact that stdout is set to ascii in your console (and you may also find that your console does not support some of the characters you may be trying to print). You can try something like the following to set sys.stdout to utf-8 and see what your console will print (just using string slice and encode below to get bytes rather than the ast.literal_eval approach that has already been suggested):
import codecs
import sys
sys.stdout = codecs.getwriter('utf-8')(sys.stdout.buffer)
s = "b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'"
b = s[2:-1].encode().decode('utf-8')
A simple way is to assume that all the characters of the initial strings are in the [0,256) range and map to the same Unicode value, which means that it is a Latin1 encoded string.
The conversion is then trivial:
str1[2:-1].encode('Latin1').decode('utf8')
Finally I have found an answer where i use a function to cast a string to bytes without encoding.Given string
str1 = "b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'"
now i take only actual encoded text inside of it
str1[2:-1]
and pass this to the function which convert the string to bytes without encoding its values
import struct
def rawbytes(s):
"""Convert a string to raw bytes without encoding"""
outlist = []
for cp in s:
num = ord(cp)
if num < 255:
outlist.append(struct.pack('B', num))
elif num < 65535:
outlist.append(struct.pack('>H', num))
else:
b = (num & 0xFF0000) >> 16
H = num & 0xFFFF
outlist.append(struct.pack('>bH', b, H))
return b''.join(outlist)
So, calling the function would convert it to bytes which then is decoded
rawbytes(str1[2:-1]).decode('utf-8')
will give the correct output
'Output file 문항분석.xlsx Created'

python 3 prints bytestring and says string is of type str?

The code I have is from an single-sign-on function
from urllib.parse import unquote
import base64
payload = unquote(payload)
print(payload)
print(type(payload))
decoded = base64.decodestring(payload)
decodestring is complaining that I gave it a string instead of bytes...
File "/Users/Jeff/Development/langalang/proj/discourse/views.py", line 38, in sso
decoded = base64.decodestring(payload)
File "/Users/Jeff/.virtualenvs/proj/lib/python3.6/base64.py", line 559, in decodestring
return decodebytes(s)
File "/Users/Jeff/.virtualenvs/proj/lib/python3.6/base64.py", line 551, in decodebytes
_input_type_check(s)
File "/Users/Jeff/.virtualenvs/proj/lib/python3.6/base64.py", line 520, in _input_type_check
raise TypeError(msg) from err
TypeError: expected bytes-like object, not str
which is fine but when I look at what my print statements printed to the terminal I see this...
b'bm9uY2U9NDI5NDg5OTU0NjU4MjAzODkyNTI=\n'
<class 'str'>
it seems to be saying it is a string of bytes, but then it says that it is a string.
What is going on here?
if I add a encode() to the end of the payload declaration I see this...
payload = unquote(payload).encode()
b"b'bm9uY2U9NDQxMTQ4MzIyNDMwNjU3MjcyMDM=\\n'"
<class 'bytes'>
EDIT: adding the method that makes the payload
#patch("discourse.views.HttpResponseRedirect")
def test_sso_success(self, mock_redirect):
"""Test for the sso view"""
# Generating a random number, encoding for url, signing it with a hash
nonce = "".join([str(random.randint(0, 9)) for i in range(20)])
# The sso payload needs to be a dict of params
params = {"nonce": nonce}
payload = base64.encodestring(urlencode(params).encode())
print(payload.decode() + " tests")
key = settings.SSO_SECRET
h = hmac.new(key.encode(), payload, digestmod=hashlib.sha256)
signature = h.hexdigest()
url = reverse("discourse:sso") + "?sso=%s&sig=%s" % (payload, signature)
req = self.rf.get(url)
req.user = self.user
response = sso(req)
self.assertTrue(mock_redirect.called)
As you payload is generate by this base64.encodestring(s) which is by documentation is:
Encode the bytes-like object s, which can contain arbitrary binary
data, and return bytes containing the base64-encoded data, with
newlines (b'\n') inserted after every 76 bytes of output, and ensuring
that there is a trailing newline, as per RFC 2045 (MIME).
Then you do urllib.parse.unquote to a byte sequence that consists of ASCII chars. At that moment you got a prefix of b' to your string as unquote runs str constructor over payload bytearray. As a request you get a str instead of bytes , which is moreover a not valid base64 encoded.
it seems to be saying it is a string of bytes, but then it says that it is a string.
Looks like you have here string looks like: "b'bm9uY2U9NDQxMTQ4MzIyNDMwNjU3MjcyMDM=\\n'" so leading b is not byte literal it is just part of string's value.
So you need to rid off this symbols before pass it to base64 decoder:
from urllib.parse import unquote, quote_from_bytes
import base64
payload = unquote(payload)
print(payload[2:-1])
enc = base64.decodebytes(payload[2:-1].encode())
print(enc)
The original error allowed to think that, and the display of the encoded string confirms it: your payload string is a unicode string, which happens to begin with the prefix "b'" and ends with a single "'".
Such a string is generally built with a repr call:
>>> b = b'abc' # b is a byte string
>>> r = repr(b) # by construction r is a unicode string
>>> print(r) # will look like a byte string
b'abc'
>>> print(b) # what is printed for a true byte string
abc
You can revert to a true byte string with literal_eval:
>>> b2 = ast.literal_eval(r)
>>> type(b2)
<class 'bytes'>
>>> b == b2
True
But the revert is only a workaround and you should track in your code where you build a representation of a byte string.

Python base64 encryptstring

I have tried to build a function that can translate an encrypted string which generated by this function:
def encrypt2(message,key):
return base64.encodestring("".join([chr(ord(message[i]) ^ ord(key[i % len(key)]))
for i in xrange(len(message))]))
So the function take a string and a key which have 4 digits (the key is a string), and encode him using the encodestring function.
The function I build to decode it looks the same but with one difference, the encodestring changes to decodestring. To my opinion it should work, but it prints an error message, "incorrect padding".
What should I change in the decryption function to solve this problem?
def decrypt2(message,key):
return (base64.decodestring("".join([chr(ord(message[i]) ^ ord(key[i % len(key)]))
for i in xrange(len(message))])))
This what I tried to execute:
message = "n"
key = "8080"
encrypt = encrypt2(message,key)
decrypt = decrypt2(encrypt,key)
print "Message: %s \nEncrypt: %s \nDecrypt: %s" %(message,encrypt,decrypt)
and as I said, the line with the decryption returned an "incorrect padding" error.
Your (insecure) "encryption" function returns ordinary base64-encoded text, encoding your ciphertext.
Your decryption function decrypts the input text (which is actually base64), then passes the "decrypted" text to the base64 decoder.
Since the original base64 text is not encrypted, trying to decrypt it results in garbage, which is not valid base64.
To make the suggestions concrete, replace your decoding function like so:
def decrypt2(message,key):
decoded = base64.decodestring(message)
return "".join(chr(ord(decoded[i]) ^ ord(key[i % len(key)]))
for i in xrange(len(decoded)))
There are more efficient ways to do this ;-) For example,
def decrypt2(message, key):
from itertools import cycle
decoded = base64.decodestring(message)
return "".join(chr(a ^ b) for a, b in zip(map(ord, decoded),
cycle(map(ord, key))))

Categories