Python UnicodeDecodeError: 'utf-8' codec can't decode byte

Python UnicodeDecodeError: 'utf-8' codec can't decode byte - python

First, I'm sorry, there's so many questions regarding this issue, and I drowned in so many answers, but I still don't get it, because my code works just if I write to the file(?)
So, why this works:
# encrypt and write to file
data_to_encrypt = "mytext"
data = data_to_encrypt.encode('utf-8')
ciphered_bytes = cipher_encrypt.encrypt(data)
ciphered_data = ciphered_bytes
print(ciphered_data)
with open(fileConfigBIN, "wb") as binary_file:
binary_file.write(ciphered_data)
# read the file and decrypt
with open(fileConfigBIN, "rb") as binary_file:
data1 = binary_file.read()
print(data1)
deciphered_bytes = cipher_decrypt.decrypt(data1)
decrypted_data = deciphered_bytes.decode('utf-8')
print(decrypted_data)
Output:
b'\xa0U\xee\xda\xa8R'
b'\xa0U\xee\xda\xa8R'
mytext
But just by commenting the write to file (the file still there with the same information):
#data_to_encrypt = "mytext"
#data = data_to_encrypt.encode('utf-8')
#ciphered_bytes = cipher_encrypt.encrypt(data)
#ciphered_data = ciphered_bytes
#print(ciphered_data)
#with open(fileConfigBIN, "wb") as binary_file:
# binary_file.write(ciphered_data)
with open(fileConfigBIN, "rb") as binary_file:
data1 = binary_file.read()
print(data1)
deciphered_bytes = cipher_decrypt.decrypt(data1)
decrypted_data = deciphered_bytes.decode('utf-8')
print(decrypted_data)
I get this:
b'\xa0U\xee\xda\xa8R'
Exception in Tkinter callback
Traceback (most recent call last):
File "C:\Users...................\Python37-32\lib\tkinter__init__.py", line 1705, in call
return self.func(*args)
File "C:\Users\Fabio\source.............", line 141, in AbrirConfig
decrypted_data = deciphered_bytes.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xdd in position 1: invalid continuation byte
The read code is using the same file as the commented part save it!
I tried to save and read in 'latin-1', 'ISO-8859-1' and others.. it give no errors, but returns just strange characters
Using Python 3.7
Thank's in advance!
EDIT:
Full working minimal code:
from Crypto.Cipher import AES
fileConfigBIN = "data.dat"
key = b'\x16\x18\xed\x1c^\xaaGN\rl\xc0]\xf0t=\xd0\xdc]t\xaf\xb2\x12,\xe6\xfc\xd6\x11-\x10\xb4\xb1\x0b'
cipher_encrypt = AES.new(key, AES.MODE_CFB)
iv = cipher_encrypt.iv
cipher_decrypt = AES.new(key, AES.MODE_CFB, iv=iv)
def Encrypt():
data_to_encrypt = "MyTextToEncrypt"
data = data_to_encrypt.encode('utf-8')
ciphered_bytes = cipher_encrypt.encrypt(data)
ciphered_data = ciphered_bytes
with open(fileConfigBIN, "wb") as binary_file:
binary_file.write(ciphered_data)
print("CRYPTED DATA SAVED TO FILE: ", str(ciphered_data))
def Decrypt():
with open(fileConfigBIN, "rb") as binary_file:
data1 = binary_file.read()
print("DATA READ FROM FILE: ", data1)
deciphered_bytes = cipher_decrypt.decrypt(data1)
decrypted_data = deciphered_bytes.decode('utf-8')
print("DECRYPTED DATA: ", decrypted_data)
Encrypt() #comment this function to show the problem
print('---------------------')
Decrypt()
I did it guys! Thanks to #Hoenie insight, for every encryption, I store the iv value and use it to decrypt when needed. If happens to need to encrypt again, it save another iv value and so on... Thanks thanks!

Ok, it seems this does have nothing to do with encoding/decoding.
You need to create a new cipher object, because it can only be used once.
def Decrypt():
cipher_decrypt = AES.new(key, AES.MODE_CFB, iv=iv) # <-- add this
with open(fileConfigBIN, "rb") as binary_file:
data1 = binary_file.read()
print("DATA READ FROM FILE: ", data1)
deciphered_bytes = cipher_decrypt.decrypt(data1)
decrypted_data = deciphered_bytes.decode('utf-8')
print("DECRYPTED DATA: ", decrypted_data)
In the source code: GitHub
A cipher object is stateful: once you have decrypted a message
you cannot decrypt (or encrypt) another message with the same
object.
Thanks to: https://stackoverflow.com/a/54082879/7547749

Related

AES encryption and padding across multiple blocks

I am encrypting a large (100GB+) file with Python using PyCryptodome using AES-256 in CBC mode.
Rather than read the entire file into memory and encrypt it in one fell swoop, I would like to read the input file a 'chunk' at a time and append to the output file with the results of encrypting each 'chunk.'
Regrettably, the documentation for PyCryptodome is lacking in that I can't find any examples of how to encrypt a long plaintext with multiple calls to encrypt(). All the examples use a short plaintext and encrypt the entire plaintext in a single call to encrypt().
I had assumed that if my input 'chunk' is a multiple of 16 bytes (the block size of AES in CBC mode) I wouldn't need to add padding to any 'chunk' but the last one. However, I wasn't able to get that to work. (I got padding errors while decrypting.)
I'm finding that in order to successfully decrypt the file, I need to add padding to every 'chunk' when encrypting, and decrypt in units of the input chunk size plus 16 bytes. This means the decrypting process needs to know the 'chunk size' used for encryption, which makes me believe that this is probably an incorrect implementation.
While I do have my encryption/decryption working as described, I wonder if this is the 'correct' way to do it. (I suspect it is not.) I've read inconsistent claims on whether or not every such 'chunk' needs padding. If not, I'd like some handholding to get Pycryptodome to encrypt and then decrypt a large plaintext across multiple calls to encrypt() and decrypt().
EDIT: This code throws a ValueError, "Padding is incorrect," when decrpyting the first 'chunk'.
def encrypt_file(infile, outfile, aeskey, iv):
cipher = AES.new(aeskey, AES.MODE_CBC, iv)
with open(infile, "rb") as fin:
with open(outfile, "wb") as fout:
while True:
data = fin.read(16 * 32)
if len(data) ==0:
break
insize = len(data)
if insize == (16 * 32):
padded_data = data
else:
padded_data = pad(data, AES.block_size)
fout.write(cipher.encrypt(padded_data))
def decrypt_file(infile, outfile, aeskey, iv):
cipher = AES.new(aeskey, AES.MODE_CBC, iv)
with open (infile, "rb") as fin:
with open(outfile, "wb") as fout:
while True:
data = fin.read(16 * 32)
if len(data) == 0:
break
fout.write(unpad(cipher.decrypt(data), AES.block_size))

My problem was related to the PAD of the last block. It is necessary to detect which is the last fragment read in bytes in order to add the PAD.
def decrypt_file(
self, filename: str, output_file: str, save_path: str, key, iv
):
cipher_aes = AES.new(key, AES.MODE_CBC, iv)
log.info(f'Decrypting file: {filename} output: {output_file}')
count = 0
previous_data = None
with open(filename, "rb") as f, open(
f"{save_path}/{output_file}", "wb"
) as f2:
while True:
count+=1
data = f.read(self.block_size)
if data == b"":
decrypted = cipher_aes.decrypt(previous_data)
log.info(f'Last block UnPadding Count: {count} BlockSize: {self.block_size}')
decrypted = unpad(decrypted, AES.block_size, style="pkcs7")
f2.write(decrypted)
break
if previous_data:
decrypted = cipher_aes.decrypt(previous_data)
f2.write(decrypted)
previous_data = data
And apply the decrypt:
def decrypt_file(
self, filename: str, output_file: str, save_path: str, key, iv
):
cipher_aes = AES.new(key, AES.MODE_CBC, iv)
log.info(f'Decrypting file: {filename} output: {output_file}')
count = 0
previous_data = None
with open(filename, "rb") as f, open(
f"{save_path}/{output_file}", "wb"
) as f2:
while True:
count+=1
data = f.read(self.block_size)
if data == b"":
decrypted = cipher_aes.decrypt(previous_data)
log.info(f'Last block UnPadding Count: {count} BlockSize: {self.block_size}')
decrypted = unpad(decrypted, AES.block_size, style="pkcs7")
f2.write(decrypted)
break
if previous_data:
decrypted = cipher_aes.decrypt(previous_data)
f2.write(decrypted)
previous_data = data

It looks like the fix is to do similar chunksize/padding comparison in the decrypt function as I used in the encrypt function:
def decrypt_file(infile, outfile, aeskey, iv):
cipher = AES.new(aeskey, AES.MODE_CBC, iv)
with open (infile, "rb") as fin:
with open(outfile, "wb") as fout:
while True:
data = fin.read(16 * 32)
if len(data) == 0:
break
if len(data) == (16 * 32):
decrypted_data = cipher.decrypt(data)
else:
decrypted_data = unpad(cipher.decrypt(data), AES.block_size)
fout.write(decrypted_data)

cryptography.fernet.InvalidToken problem with cryptography

Getting this error when trying to run this:
File "Test Files.py", line 502, in decryptdefault
decrypted = fernet.decrypt(d)
File "/usr/lib/python3/dist-packages/cryptography/fernet.py", line 74, in decrypt
timestamp, data = Fernet._get_unverified_token_data(token)
File "/usr/lib/python3/dist-packages/cryptography/fernet.py", line 92, in _get_unverified_token_data
raise InvalidToken
cryptography.fernet.InvalidToken
FYI dk variable is defined with key (default key)
dk = 'niwaXsYbDiAxmLiqRiFbDa_8gHio15sNQ6ZO-sQ0nR4='
# Decrypts the file with default key
def decryptdefault(inclufile):
Key = dk
fernet = Fernet(Key)
readfile = open(inclufile, 'rb')
d = readfile.read()
readfile.close()
# Decrypts and puts it into the text
if readfile != "":
decrypted = fernet.decrypt(d)
decrypted = str(decrypted).replace('b\'', '', 1)
decrypted = decrypted[:-3]
return str(decrypted)
Edit: I added the key for those who asked

I have found out, through trial and error with the same project later down the line, that you need to turn your key into something like this key = b'niwaXsYbDiAxmLiqRiFbDa_8gHio15sNQ6ZO-sQ0nR4='
The main difference being the key is encoded in a utf-8 format and is now readable by Fernet and doesn't return that error. Here is a function that uses Tkinter, Fernet, and os to actually decrypt my file.
# Propriatary method of encrypting files
def decrypt(self, file):
with open(file, 'rb') as readfile:
contents = readfile.read()
self.title(os.path.basename(file) + ' - SecureNote')
# self.textbox is a variable inside of the class I am using for my window
self.textbox.delete(1.0, tk.END)
if contents != "":
# getword retur
Key = bytes(getword('Key:', 1), encoding="utf-8")
fernet = Fernet(Key)
decrypted = fernet.decrypt(contents).decode('utf-8')
self.textbox.insert(1.0, str(decrypted))
del Key
del fernet
else:
pass

Encrypting a file using saved RSA keys in python

I am trying to encrypt an image file using RSA keys that are generated by another script and saved into a .pem file. when i am trying to encrypt the file its showing errors like this
Traceback (most recent call last):
File "rsaencrypt.py", line 85, in <module>
main()
File "rsaencrypt.py", line 45, in main
content = fileObj.read()
File "/usr/lib64/python3.7/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
I am new to python and file handling so i think the problem is in the way of how i am handling the files both the key files and the inputfile. Looking forward to some suggestion.
Heres the code of my encryption file:
import time, os, sys
def main():
inputFilename = 'img.jpg'
# BE CAREFUL! If a file with the outputFilename name already exists,
# this program will overwrite that file.
outputFilename = 'encrypted.jpg'
myKey = open("public_key.pem",'r')
myMode = 'encrypt' # set to 'encrypt' or 'decrypt'
# If the input file does not exist, then the program terminates early.
if not os.path.exists(inputFilename):
print('The file %s does not exist. Quitting...' % (inputFilename))
sys.exit()
# If the output file already exists, give the user a chance to quit.
if os.path.exists(outputFilename):
print('This will overwrite the file %s. (C)ontinue or (Q)uit?' % (outputFilename))
response = input('> ')
if not response.lower().startswith('c'):
sys.exit()
# Read in the message from the input file
fileObj = open(inputFilename)
content = fileObj.read()
fileObj.close()
print('%sing...' % (myMode.title()))
# Measure how long the encryption/decryption takes.
startTime = time.time()
if myMode == 'encrypt':
translated = transpositionEncrypt.encryptMessage(myKey, content)
elif myMode == 'decrypt':
translated = transpositionDecrypt.decryptMessage(myKey, content)
totalTime = round(time.time() - startTime, 2)
print('%sion time: %s seconds' % (myMode.title(), totalTime))
# Write out the translated message to the output file.
outputFileObj = open(outputFilename, 'w')
outputFileObj.write(translated)
outputFileObj.close()
print('Done %sing %s (%s characters).' % (myMode, inputFilename, len(content)))
print('%sed file is %s.' % (myMode.title(), outputFilename))
# If transpositionCipherFile.py is run (instead of imported as a module)
# call the main() function.
if __name__ == '__main__':
main()

You need to open the file in binary mode, not text (which is the default).
Turn
fileObj = open(inputFilename)
into
fileObj = open(inputFilename, "rb")
and .read() will return bytes (i.e. binary data), not str (i.e. text).

Encrypt data in PostgreSQL and decrypt the data in Python

I have data in my database that I need to encrypt. I will then download the database to csv files. I have a python program that can decrypt the specific columns in a csv file. The problem is that I don't get my data out from the python program.
sql function:
CREATE OR REPLACE FUNCTION AESEncrypt (data TEXT,pass TEXT)
RETURNS TEXT AS $crypted$
declare
crypted TEXT;
key BYTEA;
iv BYTEA;
BEGIN
key := digest(convert_to(pass, 'utf-8'), 'sha256');
iv := digest(convert_to(CONCAT(data , 'salt'), 'utf-8'), 'md5');
crypted := encode(encrypt_iv(convert_to(data, 'utf-8'), key, iv, 'aes'), 'base64');
RETURN crypted;
END;
$crypted$ LANGUAGE plpgsql;
python program:
import csv
import time
import base64
from hashlib import sha256, md5
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
from cryptography.hazmat.backends import default_backend
password = 'Password'
inputFile = 'test.txt'
outputFile = 'out.txt'
delimiter = ';'
columns = [0]
backend = default_backend()
key = sha256(password.encode('utf-8')).digest()
iv = md5((password + 'salt').encode('utf-8')).digest()
cipher = Cipher(algorithms.AES(key), modes.CBC(iv), backend=backend)
def encrypt(input):
input = bytes(input, 'utf-8')
#Padding
length = 16 - (len(input) % 16)
input += bytes([length])*length
#Encrypt
encryptor = cipher.encryptor()
return base64.b64encode(encryptor.update(input) + encryptor.finalize()).decode("utf-8")
def decrypt(input):
input = base64.b64decode(input)
decryptor = cipher.decryptor()
data = decryptor.update(input) + decryptor.finalize()
data = data[:-data[-1]] #Remove padding
print(data)
return data.decode('utf-8')
def main():
start_time = time.time()
with open(inputFile, 'r') as csvfileIn:
with open(outputFile, 'w', newline='') as csvfileOut:
spamreader = csv.reader(csvfileIn, delimiter=delimiter)
spamwriter = csv.writer(csvfileOut, delimiter=delimiter)
firstRow = True
for row in spamreader:
if not firstRow:
for pos in columns:
row[pos] = decrypt(row[pos])
firstRow = False
spamwriter.writerow(row)
print("--- %s seconds ---" % (time.time() - start_time))
main()
If I encrypt the file with the encrypt function written in the python program then I get the correct result if i would decrypt it.
If I would call the sql funcion as AESEncrypt('data', 'Password') then it returns the base64 string Ojq6RKg7NgDx8YFdLzfVhQ==
But after decryption I get the empty string as result and not the string data. If I look at the print statment before the utf-8 decode step in the decryption function it prints out the following on the console b'', so it looks like it could be something wrong with the padding. If I would print before I remove the padding I get b'\x85\x90sz\x0cQS\x9bs\xeefvA\xc63-'. If I will encrypt a long sentence then I will actually see parts of the text in the byte outputs above.
Do anyone know what I have done wrong?

ValueError: AES key must be either 16, 24, or 32 bytes long PyCrypto 2.7a1

I'm making programm for my school project and have one problem above.
Here's my code:
def aes():
#aes
os.system('cls')
print('1. Encrypt')
print('2. Decrypt')
c = input('Your choice:')
if int(c) == 1:
#cipher
os.system('cls')
print("Let's encrypt, alright")
print('Input a text to be encrypted')
text = input()
f = open('plaintext.txt', 'w')
f.write(text)
f.close()
BLOCK_SIZE = 32
PADDING = '{'
pad = lambda s: s + (BLOCK_SIZE - len(s) % BLOCK_SIZE) * PADDING
EncodeAES = lambda c, s: base64.b64encode(c.encrypt(pad(s)))
secret = os.urandom(BLOCK_SIZE)
f = open('aeskey.txt', 'w')
f.write(str(secret))
f.close()
f = open('plaintext.txt', 'r')
privateInfo = f.read()
f.close()
cipher = AES.new(secret)
encoded = EncodeAES(cipher, privateInfo)
f = open('plaintext.txt', 'w')
f.write(str(encoded))
f.close()
print(str(encoded))
if int(c) == 2:
os.system('cls')
print("Let's decrypt, alright")
f = open('plaintext.txt','r')
encryptedString = f.read()
f.close()
PADDING = '{'
DecodeAES = lambda c, e: c.decrypt(base64.b64decode(e)).rstrip(PADDING)
encryption = encryptedString
f = open('aeskey.txt', 'r')
key = f.read()
f.close()
cipher = AES.new(key)
decoded = DecodeAES(cipher, encryption)
f = open('plaintext.txt', 'w')
f.write(decoded)
f.close()
print(decoded)
Full error text:
Traceback (most recent call last): File "C:/Users/vital/Desktop/Prog/Python/Enc_dec/Enc_dec.py", line 341, in aes()
File "C:/Users/vital/Desktop/Prog/Python/Enc_dec/Enc_dec.py", line 180, in aes cipher = AES.new(key)
File "C:\Users\vital\AppData\Local\Programs\Python\Python35-32\lib\site-packages\Crypto\Cipher\AES.py", line 179, in new return AESCipher(key, *args, **kwargs)
File "C:\Users\vital\AppData\Local\Programs\Python\Python35-32\lib\site-packages\Crypto\Cipher\AES.py", line 114, in init blockalgo.BlockAlgo.init(self, _AES, key, *args, **kwargs)
File "C:\Users\vital\AppData\Local\Programs\Python\Python35-32\lib\site-packages\Crypto\Cipher\blockalgo.py", line 401, in init self._cipher = factory.new(key, *args, **kwargs)
ValueError: AES key must be either 16, 24, or 32 bytes long
Process finished with exit code 1
What am I doing wrong?

The error is very clear. The key must be exactly of that size. os.urandom will return you the correct key. However this key is a bytes (binary string value). Furthermore, by using str(secret), the value of repr(secret) is written into the file instead of secret.
What is more confusing is that AES.new allows you to pass the key as Unicode! However, suppose the key was the ASCII bytes 1234123412341234. Now,
f.write(str(secret))
will write b'1234123412341234' to the text file! Instead of 16 bytes, it now contains those 16 bytes + the b, and two ' quote characters; 19 bytes in total.
Or if you take a random binary string from os.urandom,
>>> os.urandom(16)
b'\xd7\x82K^\x7fe[\x9e\x96\xcb9\xbf\xa0\xd9s\xcb'
now, instead of writing 16 bytes D7, 82,.. and so forth, it now writes that string into the file. And the error occurs because the decryption tries to use
"b'\\xd7\\x82K^\\x7fe[\\x9e\\x96\\xcb9\\xbf\\xa0\\xd9s\\xcb'"
as the decryption key, which, when encoded as UTF-8 results in
b"b'\\xd7\\x82K^\\x7fe[\\x9e\\x96\\xcb9\\xbf\\xa0\\xd9s\\xcb'"
which is a 49-bytes long bytes value.
You have 2 good choices. Either you continue to write your key to a text file, but convert it to hex, or write the key into a binary file; then the file should be exactly the key length in bytes. I am going for the latter here:
Thus for storing the key, use
with open('aeskey.bin', 'wb') as keyfile:
keyfile.write(secret)
and
with open('aeskey.bin', 'rb') as keyfile:
key = keyfile.read()
Same naturally applies to the cipher text (that is the encrypted binary), you must write and read it to and from a binary file:
with open('ciphertext.bin', 'wb') as f:
f.write(encoded)
and
with open('ciphertext.bin', 'rb') as f:
encryptedString = f.read()
If you want to base64-encode it, do note that base64.b64encode/decode are bytes-in/bytes-out.
By the way, plaintext is the original, unencrypted text; the encrypted text is called ciphertext. AES is a cipher that can encrypt plaintext to ciphertext and decrypt ciphertext to plaintext using a key.
Despite these being called "-text" neither of them is textual data per se, as understood by Python, but they're binary data, and should be represented as bytes.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python UnicodeDecodeError: 'utf-8' codec can't decode byte - python

Related

AES encryption and padding across multiple blocks

cryptography.fernet.InvalidToken problem with cryptography

Encrypting a file using saved RSA keys in python

Encrypt data in PostgreSQL and decrypt the data in Python

ValueError: AES key must be either 16, 24, or 32 bytes long PyCrypto 2.7a1

Categories

Resources