Base64 decoding and encoding give different results - python

I have the two following encoded string :
base64_str1 = 'eyJzZWN0aW9uX29mZnNldCI6MiwiaXRlbXNfb2Zmc2V0IjozNiwidmVyc2lvbiI6MX0%3D'
base64_str2 = 'eyJzZWN0aW9uX29mZnNldCI6MCwiaXRlbXNfb2Zmc2V0IjowLCJ2ZXJzaW9uIjoxfQ%3D%3D'
Using Base64 online decoder/encoder , the results are as follow (which are the right results) :
base64_str1_decoded = '{"section_offset":2,"items_offset":36,"version":1}7'
base64_str2_decoded = '{"section_offset":0,"items_offset":0,"version":1}'
However, when I tried to encode base64_str1_decoded or base64_str2_decoded back to Base64, I'm not able to obtain the initial base64 strings.
For instance, the ouput for the following code :
base64_str2_decoded = '{"section_offset":0,"items_offset":0,"version":1}'
recoded_str2 = base64.b64encode(bytes(base64_str2_decoded, 'utf-8'))
print(recoded_str2)
# output = b'eyJzZWN0aW9uX29mZnNldCI6MCwiaXRlbXNfb2Zmc2V0IjowLCJ2ZXJzaW9uIjoxfQ=='
# expected_output = eyJzZWN0aW9uX29mZnNldCI6MCwiaXRlbXNfb2Zmc2V0IjowLCJ2ZXJzaW9uIjoxfQ%3D%3D
I tried changing the encoding scheme but can't seem to make it work.

Notice that extra 7 at the end of base64_str1_decoded? That's because your input strings are incorrect. They have escape codes required for URLs. %3D is an escape code for =, which is what should be entered into the online decoder instead. You'll notice the 2nd string in the decoder has an extra ÃÜ on the next line you haven't shown due to using %3D%3D instead of ==. That online decoder is allowing invalid base64 to be decoded.
To correctly decode in Python use urllib.parse.unquote on the string to remove the escaping first:
import base64
import urllib.parse
base64_str1 = 'eyJzZWN0aW9uX29mZnNldCI6MiwiaXRlbXNfb2Zmc2V0IjozNiwidmVyc2lvbiI6MX0%3D'
base64_str2 = 'eyJzZWN0aW9uX29mZnNldCI6MCwiaXRlbXNfb2Zmc2V0IjowLCJ2ZXJzaW9uIjoxfQ%3D%3D'
# Demonstrate Python decoder detects invalid B64 encoding
try:
print(base64.b64decode(base64_str1))
except Exception as e:
print('Exception:', e)
try:
print(base64.b64decode(base64_str2))
except Exception as e:
print('Exception:', e)
# Decode after unquoting...
base64_str1_decoded = base64.b64decode(urllib.parse.unquote(base64_str1))
base64_str2_decoded = base64.b64decode(urllib.parse.unquote(base64_str2))
print(base64_str1_decoded)
print(base64_str2_decoded)
# See valid B64 encoding.
recoded_str1 = base64.b64encode(base64_str1_decoded)
recoded_str2 = base64.b64encode(base64_str2_decoded)
print(recoded_str1)
print(recoded_str2)
Output:
Exception: Invalid base64-encoded string: number of data characters (69) cannot be 1 more than a multiple of 4
Exception: Incorrect padding
b'{"section_offset":2,"items_offset":36,"version":1}'
b'{"section_offset":0,"items_offset":0,"version":1}'
b'eyJzZWN0aW9uX29mZnNldCI6MiwiaXRlbXNfb2Zmc2V0IjozNiwidmVyc2lvbiI6MX0='
b'eyJzZWN0aW9uX29mZnNldCI6MCwiaXRlbXNfb2Zmc2V0IjowLCJ2ZXJzaW9uIjoxfQ=='
Note that the b'' notation is Python's indication that the object is a byte string as opposed to a Unicode string and is not part of the string itself.

Related

ValueError: Data must be aligned to block boundary in ECB mode (or additional backslashes from encoding of encrypted text)

I have this code:
from Crypto.Cipher import DES
# Encryption part
key = b'abcdefgh'
def pad(text):
while len(text) % 8 != 0:
text += b' '
return text
des = DES.new(key, DES.MODE_ECB)
text = b'Secret text'
padded_text = pad(text)
encrypted_text = des.encrypt(padded_text)
print(encrypted_text) # FIRST
# Decryption part
that_encrypted_text = input().encode('utf8')
# This print shows the problem---------------
print(that_encrypted_text) # SECOND
# This print shows the problem --------------
data = des.decrypt(that_encrypted_text)
print(data)
From the FIRST print we can see: b'.\x12\x7f\xcf\xad+\xa9\x0c\xc4\xde\x05\x15\xef\x7f\x16\xa0'
Fill in the input(): .\x12\x7f\xcf\xad+\xa9\x0c\xc4\xde\x05\x15\xef\x7f\x16\xa0
From the SECOND print we can see: b'.\\x12\\x7f\\xcf\\xad+\\xa9\\x0c\\xc4\\xde\\x05\\x15\\xef\\x7f\\x16\\xa0'
And after this (because of additional backslashes) an error appears:
ValueError: Data must be aligned to block boundary in ECB mode
Why do additional backslashes appear from encoding and how to get rid of them so that the message was decrypted?
I want both parts of program: encryption and decryption to work separately. That's why there is input() for an encrypted text.
Fill in the input(): .\x12\x7f\xcf\xad+\xa9\x0c\xc4\xde\x05\x15\xef\x7f\x16\xa0
is equivalent to r'.\x12\x7f\xcf\xad+\xa9\x0c\xc4\xde\x05\x15\xef\x7f\x16\xa0' (and it's origin for doubled backslashes in your SECOND print).
Use
that_encrypted_text = (input().encode( 'raw_unicode_escape')
.decode( 'unicode_escape')
.encode( 'latin1'))
See how Python specific text encodings raw_unicode_escape and unicode_escape manipulate with backslashes (and note the role of latin1 encoding there).

Python adds extra to crypt result

I'm trying to create an API with token to communicate between an Raspberry Pi and a Webserver. Right now i'm tring to generate an Token with Python.
from Crypto.Cipher import AES
import base64
import os
import time
import datetime
import requests
BLOCK_SIZE = 32
BLOCK_SZ = 14
#!/usr/bin/python
salt = "123456789123" # Zorg dat de salt altijd even lang is! (12 Chars)
iv = "1234567891234567" # Zorg dat de salt altijd even lang is! (16 Chars)
currentDate = time.strftime("%d%m%Y")
currentTime = time.strftime("%H%M")
PADDING = '{'
pad = lambda s: s + (BLOCK_SIZE - len(s) % BLOCK_SIZE) * PADDING
EncodeAES = lambda c, s: base64.b64encode(c.encrypt(pad(s)))
DecodeAES = lambda c, e: c.decrypt(base64.b64decode(e)).rstrip(PADDING)
secret = salt + currentTime
cipher=AES.new(key=secret,mode=AES.MODE_CBC,IV=iv)
encode = currentDate
encoded = EncodeAES(cipher, encode)
print (encoded)
The problem is that the output of the script an exta b' adds to every encoded string.. And on every end a '
C:\Python36-32>python.exe encrypt.py
b'Qge6lbC+SulFgTk/7TZ0TKHUP0SFS8G+nd5un4iv9iI='
C:\Python36-32>python.exe encrypt.py
b'DTcotcaU98QkRxCzRR01hh4yqqyC92u4oAuf0bSrQZQ='
Hopefully someone can explain what went wrong.
FIXED!
I was able to fix it to decode it to utf-8 format.
sendtoken = encoded.decode('utf-8')
You are running Python 3.6, which uses Unicode (UTF-8) for string literals. I expect that the EncodeAES() function returns an ASCII string, which Python is indicating is a bytestring rather than a Unicode string by prepending the b to the string literal it prints.
You could strip the b out of the output post-Python, or you could print(str(encoded)), which should give you the same characters, since ASCII is valid UTF-8.
EDIT:
What you need to do is decode the bytestring into UTF-8, as mentioned in the answer and in a comment above. I was wrong about str() doing the conversion for you, you need to call decode('UTF-8') on the bytestring you wish to print. That converts the string into the internal UTF-8 representation, which then prints correctly.

python 3 prints bytestring and says string is of type str?

The code I have is from an single-sign-on function
from urllib.parse import unquote
import base64
payload = unquote(payload)
print(payload)
print(type(payload))
decoded = base64.decodestring(payload)
decodestring is complaining that I gave it a string instead of bytes...
File "/Users/Jeff/Development/langalang/proj/discourse/views.py", line 38, in sso
decoded = base64.decodestring(payload)
File "/Users/Jeff/.virtualenvs/proj/lib/python3.6/base64.py", line 559, in decodestring
return decodebytes(s)
File "/Users/Jeff/.virtualenvs/proj/lib/python3.6/base64.py", line 551, in decodebytes
_input_type_check(s)
File "/Users/Jeff/.virtualenvs/proj/lib/python3.6/base64.py", line 520, in _input_type_check
raise TypeError(msg) from err
TypeError: expected bytes-like object, not str
which is fine but when I look at what my print statements printed to the terminal I see this...
b'bm9uY2U9NDI5NDg5OTU0NjU4MjAzODkyNTI=\n'
<class 'str'>
it seems to be saying it is a string of bytes, but then it says that it is a string.
What is going on here?
if I add a encode() to the end of the payload declaration I see this...
payload = unquote(payload).encode()
b"b'bm9uY2U9NDQxMTQ4MzIyNDMwNjU3MjcyMDM=\\n'"
<class 'bytes'>
EDIT: adding the method that makes the payload
#patch("discourse.views.HttpResponseRedirect")
def test_sso_success(self, mock_redirect):
"""Test for the sso view"""
# Generating a random number, encoding for url, signing it with a hash
nonce = "".join([str(random.randint(0, 9)) for i in range(20)])
# The sso payload needs to be a dict of params
params = {"nonce": nonce}
payload = base64.encodestring(urlencode(params).encode())
print(payload.decode() + " tests")
key = settings.SSO_SECRET
h = hmac.new(key.encode(), payload, digestmod=hashlib.sha256)
signature = h.hexdigest()
url = reverse("discourse:sso") + "?sso=%s&sig=%s" % (payload, signature)
req = self.rf.get(url)
req.user = self.user
response = sso(req)
self.assertTrue(mock_redirect.called)
As you payload is generate by this base64.encodestring(s) which is by documentation is:
Encode the bytes-like object s, which can contain arbitrary binary
data, and return bytes containing the base64-encoded data, with
newlines (b'\n') inserted after every 76 bytes of output, and ensuring
that there is a trailing newline, as per RFC 2045 (MIME).
Then you do urllib.parse.unquote to a byte sequence that consists of ASCII chars. At that moment you got a prefix of b' to your string as unquote runs str constructor over payload bytearray. As a request you get a str instead of bytes , which is moreover a not valid base64 encoded.
it seems to be saying it is a string of bytes, but then it says that it is a string.
Looks like you have here string looks like: "b'bm9uY2U9NDQxMTQ4MzIyNDMwNjU3MjcyMDM=\\n'" so leading b is not byte literal it is just part of string's value.
So you need to rid off this symbols before pass it to base64 decoder:
from urllib.parse import unquote, quote_from_bytes
import base64
payload = unquote(payload)
print(payload[2:-1])
enc = base64.decodebytes(payload[2:-1].encode())
print(enc)
The original error allowed to think that, and the display of the encoded string confirms it: your payload string is a unicode string, which happens to begin with the prefix "b'" and ends with a single "'".
Such a string is generally built with a repr call:
>>> b = b'abc' # b is a byte string
>>> r = repr(b) # by construction r is a unicode string
>>> print(r) # will look like a byte string
b'abc'
>>> print(b) # what is printed for a true byte string
abc
You can revert to a true byte string with literal_eval:
>>> b2 = ast.literal_eval(r)
>>> type(b2)
<class 'bytes'>
>>> b == b2
True
But the revert is only a workaround and you should track in your code where you build a representation of a byte string.

Error type conversion resulting in urllib.request.urlopen it wont work as character strings

In python 3.x, after taking the value of url (urllib.request.urlopen)
sock = urllib.request.urlopen(url)
code = sock.read (100)
code = code.replace( '\n' , 'enter' )
code = code.replace( '\t' , 'tab' )
I can not treat 'code' as strings of characters, it returns an error of type conversion
code = code.replace( '\n' , 'enter' )
code = code.replace( '\t' , 'tab' )
urllib.request returns bytes values.
Either decode to a string, or use byte literals when replacing:
code = code.replace(b'\n', b'enter')
code = code.replace(b'\t', b'tab')
Decoding requires that you know what codec was used for the textual content. You can see if a content character set was returned:
codec = sock.info().get_param('charset')
If that value is not None you can decode with that codec:
code = code.decode(codec)
The default codec for text/ mimetype responses is ISO-8859-1 (Latin 1), but HTML responses often set the desired codec in a <meta> tag in the header. Leave decoding that to a competent HTML parser, like BeautifulSoup.
Before writing:
code = code.replace('\n', 'enter')
code = code.replace('\t', 'tab')
Write this:
code = code.decode('utf-8')
Finally, your code becomes:
code = code.decode('utf-8')
code = code.replace('\n', 'enter')
code = code.replace('\t', 'tab')
urllib.request.urlopen returns bytes data
Note: Because you have f.sock, if you have a judgment urllib.request.urlopen?

Django urlsafe base64 decoding with decryption

I'm writing my own captcha system for user registration. So I need to create a suitable URL for receiving generated captcha pictures. Generation looks like this:
_cipher = cipher.new(settings.CAPTCHA_SECRET_KEY, cipher.MODE_ECB)
_encrypt_block = lambda block: _cipher.encrypt(block + ' ' * (_cipher.block_size - len(block) % _cipher.block_size))
#...
a = (self.rightnum, self.animal_type[1])
serialized = pickle.dumps(a)
encrypted = _encrypt_block(serialized)
safe_url = urlsafe_b64encode(encrypted)
But then I'm trying to receive this key via GET request in the view function, it fails on urlsafe_b64decode() with "character mapping must return integer, None or unicode" error:
def captcha(request):
try:
key = request.REQUEST['key']
decoded = urlsafe_b64decode(key)
decrypted = _decrypt_block(decoded)
deserialized = pickle.loads(decrypted)
return HttpResponse(deserialized)
except KeyError:
return HttpResponseBadRequest()
I found that on the output of urlsafe_b64encode there is an str, but GET request returns a unicode object (nevertheless it's a right string). Str() didn't help (it returns decode error deep inside django), and if I use key.repr it works, but decryptor doesn't work with an error "Input strings must be a multiple of 16 in length".
Inside a test file all this construction works perfectly, I can't understand, what's wrong?
The problem is that b64decode quite explicitly can only take bytes (a string), not unicode.
>>> import base64
>>> test = "Hi, I'm a string"
>>> enc = base64.urlsafe_b64encode(test)
>>> enc
'SGksIEknbSBhIHN0cmluZw=='
>>> uenc = unicode(enc)
>>> base64.urlsafe_b64decode(enc)
"Hi, I'm a string"
>>> base64.urlsafe_b64decode(uenc)
Traceback (most recent call last):
...
TypeError: character mapping must return integer, None or unicode
Since you know that your data only contains ASCII data (that's what base64encode will return), it should be safe to encode your unicode code points as ASCII or UTF-8 bytes, those bytes will be equivalent to the ASCII you expected.
>>> base64.urlsafe_b64decode(uenc.encode("ascii"))
"Hi, I'm a string"
I solved the problem!
deserialized = pickle.loads(captcha_decrypt(urlsafe_b64decode(key.encode('ascii'))))
return HttpResponse(str(deserialized))
But still I don't understand, why it didn't work first time.

Categories