Preamble:
I did search StackOverflow and I know someone had a question like mine, but now (a couple months after first seeing it), I could not find that answer again. I'm fully aware there is likely a duplicate, but for the life of me, I cannot find it in my searches.
The problem:
I have a cryptography module which generates a hashed password used for encrypting and decrypting secrets stored in a TinyDB database. All the functionality works except for decrypting the secret. The password is verifying correctly, so I know that isn't the issue. I am almost positive my issue is in getting the salt encoded properly in the decryption function.
Encryption code:
pas = use_password(args[0])
salt = urandom(16)
kdf = PBKDF2HMAC(algorithm=hashes.SHA256(),length=32,salt=salt,iterations=100000,)
sec = base64.b64encode(kdf.derive(bytes(pas,'utf-8')))
token = Fernet(sec).encrypt(bytes(key,'utf-8'))
salt = base64.b64encode(salt).decode('utf-8')
return token, salt
Decryption:
pas = use_password(pw)
***salt = base64.b64decode(salt)***
kdf = PBKDF2HMAC(algorithm=hashes.SHA256(),length=32,salt=salt,iterations=100000,)
sec = base64.b64encode(kdf.derive(bytes(pas,'utf-8')))
# BUG: Reported defects -#ctrenthem at 7/26/2021, 8:07:00 PM: cryptography.fernet.InvalidToken error
key = Fernet(sec).decrypt(bytes(token,'utf-8'))
return key # Returns a byte object which may need to be converted to a string.
What I tried:
I've already tried different variations of duplicating the salt encryption, but keep getting errors. Part of the problem is that base64.encode requires two "file objects" for input and output, and does not accept a string variable, which makes it unusable for my needs.
I could resolve this by creating a temp file, but that would be the worst solution because it involves
creating a new file in the filesystem with partially decrypted information, outside of RAM and the database, weakening the security of the whole system
Creates a new temp file, which is just a waste of system resources
Adds a whole lot more code just to implement something that I am positive can be implemented differently with one or two words.
Despite knowing that there is another solution, I cannot figure it out and am lost as to which base64 or bytes function will accomplish the job.
Related
Seems simple enough, and there are plenty of examples but I just can't seem to get hashes that verify with werkzeug.security's check_password_hash in python.
private string Generate_Passwd_Hash()
{
string _password = "Password";
string _salt = "cXoZSGKkuGWIbVdr";
SHA256 MyHash = SHA256.Create();
byte[] hashable = System.Text.Encoding.UTF8.GetBytes(_salt + _password);
byte[] resulthash = MyHash.ComputeHash(hashable);
return "sha256$" + _salt + "$" + BitConverter.ToString(resulthash).Replace("-", "").ToLower();
}
this should generate;
sha256$cXoZSGKkuGWIbVdr$7f5d63e849f0a2c0c5c2bd6ae4e45ead2ac730c853a1ed3460e227c06c567f49
but doesn't.
EDIT
Reading through the python code for generate_password_hash and it has a default number of iterations of 260000. Which is probably what I'm missing.
I never used werkzeug but I tried to reproduce your probelem.
I had read the docs of werkzeug.security.generate_password_hash and realized it is used in password validation only, and not meant to be a universal hashing algorithm.
The document clearly says
Hash a password with the given method and salt with a string of the
given length. The format of the string returned includes the method
that was used so that check_password_hash() can check the hash.
hashlib.pbkdf2_hmac is the hashing algorithm werkzeug uses internally(from now on we call it underlying algorithm). and you don't need install it because it is in standard library.
The source code of check_password_hash shows it generates a random salt before calling underlying algorithm. The salt is to protect from attacks. And it is remembered by the werkzeug framework so that check_password_hash can use to validate later.
So to summarize:
werkzeug.security.generate_password_hash only guarantee that generated hash can be validated by check_password_hash, and no more. You simply cannot(or not supposed to) try to generate same hash by other libraries or languages.
If you really want to compare the hashing algorithm in python and C#, please post another question(or update this question) that compares underlying algorithm(hashlib.pbkdf2_hmac which allow specifying salt as parameter) with C# version. Note seems in C# there's no built in algorithm for pbkdf2, see Hash Password in C#? Bcrypt/PBKDF2.
I'm using this to encrypt a file, and then to decrypt a file, using AES-GCM:
(do pip install pycryptodome first if not installed yet)
import Crypto.Random, Crypto.Protocol.KDF, Crypto.Cipher.AES
def cipherAES_GCM(pwd, nonce):
key = Crypto.Protocol.KDF.PBKDF2(pwd, nonce, count=100_000)
return Crypto.Cipher.AES.new(key, Crypto.Cipher.AES.MODE_GCM, nonce=nonce)
# encrypt
plaintext = b'HelloHelloHelloHelloHelloHelloHello' # in reality, read from a file
key = b'mykey'
nonce = Crypto.Random.new().read(16)
c, tag = cipherAES_GCM(key, nonce).encrypt_and_digest(plaintext)
ciphertext = nonce + tag + c # write ciphertext to disk as the "encrypted file"
# decrypt
nonce, tag, c = ciphertext[:16], ciphertext[16:32], ciphertext[32:] # read from the "encrypted file" on disk
plain = cipherAES_GCM(key, nonce).decrypt_and_verify(c, tag).decode()
print(plain) # HelloHelloHelloHelloHelloHelloHello
Is this considered a good encryption practice, and what the potential weaknesses of this file encryption implementation?
Remark: I have 10,000 files to encrypt. If each single time I encrypt a file, I call the KDF (with a high count value), this will be highly unefficient!
A better solution would be: call the KDF only once (with a nonce1), and then for each file do:
nonce2 = Crypto.Random.new().read(16)
cipher, tag = AES.new(key, AES.MODE_GCM, nonce=nonce2).encrypt_and_digest(plain)
But then does this mean I have to write nonce1 | nonce2 | ciphertext | tag to disk for each file? This adds an additional 16-byte nonce1 to each file...
A suggestion for improving your code would be to apply a 12 bytes nonce for GCM. Currently a 16 bytes nonce is used and this should be changed, see here sec. Note, and here.
Crucial for the security of GCM is that no key/nonce pair is used more than once, here. Since in your code for each encryption a random nonce is generated, this issue is prevented.
Your code applies the nonce also as salt for the key derivation, which is in principle no security problem as this does not lead to multiple use of the same key/nonce pair, here.
However, a disadvantage from this is possibly that the salt length is determined by the nonce length. If this is not desired (i.e. if e.g. a larger salt should be used), an alternative approach would be to generate a random salt for each encryption to derive both the key and nonce via the KDF, here. In this scenario, the concatenated data salt | ciphertext | tag would then be passed to the recipient. Another alternative would be to completely separate nonce and key generation and to generate for each encryption both a random nonce and a random salt for key generation. In this case the concatenated data salt | nonce | ciphertext | tag would have to be passed to the recipient. Note that like the nonce and the tag, also the salt is no secret, so that it can be sent along with the ciphertext.
The code applies an iteration count of 100,000. Generally, the following applies: The iteration count should be as high as can be tolerated for your environment, while maintaining acceptable performance, here. If 100,000 meets this criterion for your environment then this is OK.
The concatenation order you use is nonce | tag | ciphertext. This is not a problem as long as both sides know this. Often by convention, the nonce | ciphertext | tag order is used (e.g. Java implicitly appends the tag to the ciphertext), which could also be used in the code if you want to stick to this convention.
It is also important that an up-to-date, maintained library is used, which is the case with PyCryptodome (unlike its predecessor, the legacy PyCrypto, which should not be used at all).
Edit:
The PBKDF2 implementation of PyCryptodome uses by default 16 bytes for the length of the generated key, which corresponds to AES-128. For the digest HMAC/SHA1 is applied by default. The posted code uses these standard parameters, none of which are insecure, but can of course be changed if necessary, here.
Note: Although SHA1 itself is insecure, this does not apply in the context
of PBKDF2 or HMAC, here. However, to support the extinction of SHA1 from the ecosystem, SHA256 could be used.
Edit: (regarding the update of the question):
The use case presented in the edited question is the encryption of 10,000 files. The posted code is executed for each file, so that a corresponding number of keys are generated via the KDF which leads to a corresponding loss of perfomance. This is described by you as highly unefficient. However, it should not be forgotten that the current code focuses on security and less on performance. In my answer I pointed out that e.g. the iteration count is a parameter which allows tuning between performance and security within certain limits.
A PBKDF (password based key derivation function) allows to derive a key from a weak password. To keep the encryption secure, the derivation time is intentionally increased so that an attacker cannot crack the weak password faster than a strong key (ideally). If the derivation time is shortened (e.g. by decreasing the iteration count or by using the same key more than once) this generally leads to a security reduction. Or in short, a performance gain (by a faster PBKDF) generally reduces security. This results in a certain leeway for more performant (but weaker) solutions.
The more performant solution you suggest is the following: As before, a random nonce is generated for each file. But instead of encrypting each file with its own key, all files are encrypted with the same key. For this purpose, a random salt is generated once, with which this key is derived via the KDF. This does indeed mean a significant performance gain. However, this is automatically accompanied by a reduction in security: Should an attacker succeed in obtaining the key, the attacker can decrypt all files (and not just one as in the original scenario). However, this disadvantage is not a mandatory exclusion criterion if it is acceptable within the scope of your security requirements (which seems to be the case here).
The more performant solution requires that the information salt | nonce | ciphertext | tag must be sent to the recipient. The salt is important and must not be missing, because the recipient needs the salt to derive the key via the PBKDF. Once the recipient has determined the key, the ciphertext can be authenticated with the tag and decrypted using the nonce. If it has been agreed with the recipient that the same key will be used for each file, it is sufficient for the recipient to derive the key once via the PBKDF. Otherwise the key must be derived for each file.
If the salt with its 16 bytes is unwanted (since it is identical for all files in this approach), alternative architectures could be considered. For example, a hybrid scheme might be used: A random symmetric key is generated and exchanged using a public key infrastructure. Also here, all files can be encrypted with the same key or each file can be encrypted with its own key.
But for more specific suggestions for a design proposal, the use case should be described in more detail, e.g. regarding the files: How large are the files? Is processing in streams/chunks necessary? Or regarding the recipients: How many recipients are there? What is aligned with the recipients? etc.
This seems to be fine but I have a recommendation which is to not use same nonce for encryption and key derivation (nonce stands for key used only once using same nonce so you can pass the md5 hash of nonce to the encryption function instead if you dont want to use another nonce(IV). Second I think you can switch to cryptography if you are interested in better security . This is example code using cryptography module to encrypt which also has the advantage of encrypting using 128-bit key which is secure and it take care of the rest such as IV(nonces), decryption and verification(is done using HMAC). So all your code above can be summarized in this few lines which lead to less complexity so arguably more secure code.
from cryptography.fernet import Fernet
plaintext = b"hello world"
key = Fernet.generate_key()
ctx = Fernet(key)
ciphertext = ctx.encrypt(plaintext)
print(ciphertext)
decryption = ctx.decrypt(ciphertext)
print(decryption)
EDIT: Note that the nonce you use will also weaken up the key since the nonce is sent with ciphertext, now the salt used for PBKDF is pointless and now the attacker have to just guess your password(assuming using default count) which in this case is very simple one, brute forcing can take no longer than 26^5 tries(total of lowercase alphabets for total of length 5).
So, i'm writing a python program, actually my first "big" one, that, at a certain point, needs to generate an hmac with sha-512, from a message and a key.
Let's say i have:
key = 'ff'
seed = '238973:a665f3e641d9a402df3f9d5d9a236cb8061a2437ee47de8b3c8091774484b8b3:238973'
bkey = key.encode()
bseed = seed.encode()
string = hmac.new(bKey, bSeed, 'sha3_512')
digest = string.hexdigest()
print(str(digest))
>>>92b6c5ccc654c21838cb72e932de0feaf898398300e9755b2c91553c9db0bea99a5139055b5471142b299361c6ece4b51f2f26777a6f18f0db27775625e6948f
However when i go to https://www.freeformatter.com/hmac-generator.html#ad-output it gives me another result:
249667dabafbd3fa123b9749c6b23bc90c935e30e1372f26bbab2defa791fc915d60d406efe1b165befd4a1b6dc34fcf2f0975db08a7a2938964e99ea0419dcf
which i know for a fact to be the correct one, can somebody explain me why that is? I've been looking online for hours but it seems like i'm running in circles or i don't fully grasp how the hmac works.
I would greatly appreciate your help, it's the only bit i'm missing to finish this program.
sha3_512 is not the same implementation as sha512 - the first is based on SHA-3, while the second is based on the older SHA-2 standard. Use sha512 instead when creating your hmac to use the SHA-2 version:
>>> hmac.new(b'ff', b'238973:a665f3e641d9a402df3f9d5d9a236cb8061a2437ee47de8b3c8091774484b8b3:238973', hashlib.sha512).hexdigest()
'249667dabafbd3fa123b9749c6b23bc90c935e30e1372f26bbab2defa791fc915d60d406efe1b165befd4a1b6dc34fcf2f0975db08a7a2938964e99ea0419dcf'
So I've recently learned how to store passwords in a DB, that is by adding a salt to the plaintext password, hashing it, and then storing the hash.
I'm working on a really small Flask app to try all this out, but I'm having a problem with the password hashing and checking parts of the process. It seems that I"m ending up with two different hashes for the same input and I can't seem to figure out why.
I ran a little experiment in the interpreter to test things out.
>>> from os import urandom
>>> salt = urandom(32).encode('base-64')
>>> salt
'+3DejJpQZO9d8campsxOB6fam6lBE0mJ/+UvFf3oG8c=\n'
>>> plaintext_pw = 'archer'
>>> plaintext_pw
'archer'
>>> salted_pw = plaintext_pw + salt
>>> salted_pw
'archer+3DejJpQZO9d8campsxOB6fam6lBE0mJ/+UvFf3oG8c=\n'
>>> from flaskext.bcrypt import Bcrypt
>>> bc = Bcrypt(None)
>>> hashed_pw = bc.generate_password_hash(salted_pw)
>>> hashed_pw
'$2a$12$znMwqAw.GliVE8XFgMIiA.aEGU9iEZzZZWfxej5wSUFP0huyzdUfe'
All is working well at this point, but when I turn around and do this:
>>> bc.generate_password_hash(plaintext_pw + salt)
'$2a$12$qbywkEjuJgmBvXW6peHzAe.rWjoc.ybFKRNzuZhom2yJSXaMRcVTq'
I get a completely different hash, even though I started with the same plaintext_pw and salt. I thought that wasn't supposed to happen? Furthermore each subsequent call to bc.generate_password_hash() gives me different results each time:
>>> bc.generate_password_hash(plaintext_pw + salt)
'$2a$12$FAh9r4oaD40mWPtkClAnIOisP37eAT5m.i.EGV1zRAsPNbxg3BlX2'
>>> bc.generate_password_hash(plaintext_pw + salt)
'$2a$12$gluk9RUiR6D0e2p1J.hNgeE3iTFxDUlCNvFJOsCZZk89ngO.Z6/B6'
As far as I can tell plaintext_pw and salt aren't changing between calls. I can't seem to spot the error here, could someone explain to me exactly what's happening here, and what it is I'm doing wrong?
Ok so it looks like I've solved my problem. Turns out I wasn't using bcrypt properly. Here's what I learned:
The hashes were different each time I called generate_password_hash because bcrypt automatically generates a salt for you and appends it to the hashed password, so no need to generate it with urandom or store it separately.
I didn't talk about this in my post, but its worth noting here anyway - I assumed that on login you would need to call generate_password_hash() and provide the password from the login form to create a second hash for check_password_hash() to compare against, but that isn't necessary. check_password_hash() can be called with the stored hash and the form password (respectively) and it will automatically take care of salting and hashing the form password, and comparing it to the stored hash.
And with that everything is working fine now. Hope this helps someone else!
How do you store a password entered by the user in memory and erase it securely after it is no longer need?
To elaborate, currently we have the following code:
username = raw_input('User name: ')
password = getpass.getpass()
mail = imaplib.IMAP4(MAIL_HOST)
mail.login(username, password)
After calling the login method, what do we need to do to fill the area of memory that contains password with garbled characters so that someone cannot recover the password by doing a core dump?
There is a similar question, however it is in Java and the solution uses character arrays:
How does one store password hashes securely in memory, when creating accounts?
Can this be done in Python?
Python doesn't have that low of a level of control over memory. Accept it, and move on. The best you can do is to del password after calling mail.login so that no references to the password string object remain. Any solution that purports to be able to do more than that is only giving you a false sense of security.
Python string objects are immutable; there's no direct way to change the contents of a string after it is created. Even if you were able to somehow overwrite the contents of the string referred to by password (which is technically possible with stupid ctypes tricks), there would still be other copies of the password that have been created in various string operations:
by the getpass module when it strips the trailing newline off of the inputted password
by the imaplib module when it quotes the password and then creates the complete IMAP command before passing it off to the socket
You would somehow have to get references to all of those strings and overwrite their memory as well.
There actually -is- a way to securely erase strings in Python; use the memset C function, as per Mark data as sensitive in python
Edited to add, long after the post was made: here's a deeper dive into string interning. There are some circumstances (primarily involving non-constant strings) where interning does not happen, making cleanup of the string value slightly more explicit, based on CPython reference counting GC. (Though still not a "scrubbing" / "sanitizing" cleanup.)
The correct solution is to use a bytearray() ... which is mutable, and you can safely clear keys and sensitive material from RAM.
However, there are some libraries, notably the python "cryptography" library that prevent "bytearray" from being used. This is problematic... to some extent these cryptographic libraries should ensure that only mutable types be used for key material.
There is SecureString which is a pip module that allows you to fully remove a key from memory...(I refactored it a bit and called it SecureBytes). I wrote some unit tests that demonstrate that the key is fully removed.
But there is a big caveat: if someone's password is "type", then the word "type" will get wiped from all of python... including in function definitions and object attributes.
In other words... mutating immutable types is a terrible idea, and unless you're extremely careful, can immediately crash any running program.
The right solution is: never use immutable types for key material, passwords, etc. Anyone building a cryptographic library or routine like "getpass" should be working with a "bytearray" instead of python strings.
If you don't need the mail object to persist once you are done with it, I think your best bet is to perform the mailing work in a subprocess (see the subprocess module.) That way, when the subprocess dies, so goes your password.
This could be done using numpy chararray:
import numpy as np
username = raw_input('User name: ')
mail = imaplib.IMAP4(MAIL_HOST)
x = np.chararray((20,))
x[:] = list("{:<20}".format(raw_input('Password: ')))
mail.login(username, x.tobytes().strip())
x[:] = ''
You would have to determine the maximum size of password, but this should remove the data when it is overwritten.
EDIT: removed the bad advice...
You can also use arrays like the java example if you like, but just overwriting it should be enough.
http://docs.python.org/library/array.html
Store the password in a list, and if you just set the list to null, the memory of the array stored in the list is automatically freed.