A server protocol requires me to derive a password hash with a limited key size. This is the given JavaScript + CryptoJS implementation:
var params = {keySize: size/32, hasher: CryptoJS.algo.SHA512, iterations: 5000}
var output = CryptoJS.PBKDF2(password, salt, params).toString();
I want to re-implement this in Python using Passlib, i.e. something like
from passlib.hash import pkbdf2_sha512
output = pbkdf2_sha512.hash(password, salt=salt, rounds=5000)
The Passlib API does not allow me to specify the key size. How to do it though?
If the derived key it to long just truncate it to the required length. Each byte is just as valid as every other byte, it makes no difference which bytes you use, there is no ordering.
Related
I have a python code that generates a PBKDF2 sha1 hash of a password using the hashlib.pbkdf2_hmac method. Then I use that password digest in a dotnet framework 4.5 program to verify it against the same password. The C# program returns false which suggests that the hash produced from the python program is incorrect.
The key is in this format: #iterations|salt|key.
Then, I take that key and I try to verify it using a dotnet framework app using via method:
public static bool IsValid(string testPassword, string originalDelimitedHash)
{
//extract original values from delimited hash text
var originalHashParts = originalDelimitedHash.Split('|');
var origIterations = Int32.Parse(originalHashParts[0]);
var origSalt = Convert.FromBase64String(originalHashParts[1]);
var originalHash = originalHashParts[2];
//generate hash from test password and original salt and iterations
var pbkdf2 = new Rfc2898DeriveBytes(testPassword, origSalt, origIterations, HashAlgorithmName.SHA1);
byte[] testHash = pbkdf2.GetBytes(20);
var hashStr = Convert.ToBase64String(testHash);
if (hashStr == originalHash)
return true;
return false;
}
my python program:
from hashlib import pbkdf2_hmac
from base64 import b64encode
from os import urandom
def generate_password_hash(password:string):
encodedPass = password.encode('utf8')
random_bytes = urandom(20)
salt = b64encode(random_bytes)
iterations = 5000
key = pbkdf2_hmac('sha1', encodedPass, salt, iterations, dklen=20)
result = f'{iterations}|{salt.decode("utf-8")}|{binascii.hexlify(key).decode("utf-8")}'
return result
So if my password is hDHzJnMg0O the resulting digest from the above python method would be something like 5000|J5avBy0q5p9R/6cgxUpu6+6sW7o=|2445594504c9ffb54d1f11bbd0b385e3e37a5aca
So if I take that and supply it to my C# IsValid method (see below) it returns false which means the passwords do not match
static void Main(string[] args)
{
var pass = "hDHzJnMg0O";
var hash = "5000|J5avBy0q5p9R/6cgxUpu6+6sW7o=|2445594504c9ffb54d1f11bbd0b385e3e37a5aca";
var isValid = IsValid(pass, hash); // returns False
}
The Python code:
uses b64encode(random_bytes) as salt for the PBKDF2 call. This is rather unusual (but not a bug). Typically the raw data, i.e. random_bytes, is applied as salt and passed to the PBKDF2 call. With the Base64 encoding only the string would be created.
hex encodes the key (i.e. the return value of the PBKDF2 call).
The C# code is different in these points and:
uses the raw data (i.e. random_bytes from the Python side) for the PBKDF2 call, i.e. the salt from the Python side is Base64 decoded.
Base64 encodes the key (i.e. the return value of the PBKDF2 call)
Changes in the C# code for compatibility with the Python code (of course the changes could also be made in the Python code, but the Python code seems to be the reference):
...
var origSalt = Encoding.UTF8.GetBytes(originalHashParts[1]); // Convert.FromBase64String(originalHashParts[1]);
...
var hashStr = Convert.ToHexString(testHash); // Convert.ToBase64String(testHash);
...
For the latter, Convert.ToHexString() was used, which is available since .NET 5. For other .NET versions see e.g. here.
Furthermore, since the hex encoded values are compared and the different implementations are not standardized regarding lower (e.g. binascii.hexlify(key)) and upper case letters (e.g. Convert.ToHexString(testHash)), it is more robust to convert both strings uniformly, e.g.:
if (hashStr.ToUpper() == originalHash.ToUpper())
return true;
With these changes, validation with the C# code works.
Edit (with regard to the change in the Python code addressed in the comment):
If in the Python code random_bytes is used as salt and the salt is Base64 encoded for concatenation, then in the C# code the Base64 encoded salt must be Base64 decoded again (as in the original C# code).
Seems simple enough, and there are plenty of examples but I just can't seem to get hashes that verify with werkzeug.security's check_password_hash in python.
private string Generate_Passwd_Hash()
{
string _password = "Password";
string _salt = "cXoZSGKkuGWIbVdr";
SHA256 MyHash = SHA256.Create();
byte[] hashable = System.Text.Encoding.UTF8.GetBytes(_salt + _password);
byte[] resulthash = MyHash.ComputeHash(hashable);
return "sha256$" + _salt + "$" + BitConverter.ToString(resulthash).Replace("-", "").ToLower();
}
this should generate;
sha256$cXoZSGKkuGWIbVdr$7f5d63e849f0a2c0c5c2bd6ae4e45ead2ac730c853a1ed3460e227c06c567f49
but doesn't.
EDIT
Reading through the python code for generate_password_hash and it has a default number of iterations of 260000. Which is probably what I'm missing.
I never used werkzeug but I tried to reproduce your probelem.
I had read the docs of werkzeug.security.generate_password_hash and realized it is used in password validation only, and not meant to be a universal hashing algorithm.
The document clearly says
Hash a password with the given method and salt with a string of the
given length. The format of the string returned includes the method
that was used so that check_password_hash() can check the hash.
hashlib.pbkdf2_hmac is the hashing algorithm werkzeug uses internally(from now on we call it underlying algorithm). and you don't need install it because it is in standard library.
The source code of check_password_hash shows it generates a random salt before calling underlying algorithm. The salt is to protect from attacks. And it is remembered by the werkzeug framework so that check_password_hash can use to validate later.
So to summarize:
werkzeug.security.generate_password_hash only guarantee that generated hash can be validated by check_password_hash, and no more. You simply cannot(or not supposed to) try to generate same hash by other libraries or languages.
If you really want to compare the hashing algorithm in python and C#, please post another question(or update this question) that compares underlying algorithm(hashlib.pbkdf2_hmac which allow specifying salt as parameter) with C# version. Note seems in C# there's no built in algorithm for pbkdf2, see Hash Password in C#? Bcrypt/PBKDF2.
I have a 10 character code that I want to sign by my python program, then put both the code as well as the signature in an URL, which then get's processed by a PHP SLIM API. Here the signature should get verified.
I generated my RSA keys in python like this:
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.asymmetric import padding
from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives.serialization import load_pem_private_key
from cryptography.hazmat.primitives.asymmetric import rsa
from cryptography.hazmat.primitives import serialization
def gen_key():
private_key = rsa.generate_private_key(
public_exponent=65537, key_size=2048, backend=default_backend()
)
return private_key
def save_key(pk):
pem_priv = pk.private_bytes(
encoding=serialization.Encoding.PEM,
format=serialization.PrivateFormat.PKCS8,
encryption_algorithm=serialization.NoEncryption()
)
with open(os.path.join('.', 'private_key.pem'), 'wb') as pem_out:
pem_out.write(pem_priv)
pem_pub = pk.public_key().public_bytes(
encoding=serialization.Encoding.PEM,
format=crypto_serialization.PublicFormat.SubjectPublicKeyInfo
)
with open(os.path.join('.', 'public_key.pem'), 'wb') as pem_out:
pem_out.write(pem_pub)
def main():
priv_key = gen_key()
save_key(priv_key)
I sign the key like this in python:
private_key = load_key()
pub_key = private_key.public_key()
code = '09DD57CE10'
signature = private_key.sign(
str.encode(code),
padding.PSS(
mgf=padding.MGF1(hashes.SHA256()),
salt_length=padding.PSS.MAX_LENGTH
),
hashes.SHA256()
)
The url is built like this
my_url = 'https://www.exmaple.com/codes?code={}&signature={}'.format(
code,
signature.hex()
)
Because the signature is a byte object I'm converting it to a string using the .hex() function
Now, in PHP, I am trying to verify the code and signature:
use phpseclib3\Crypt\PublicKeyLoader;
$key = PublicKeyLoader::load(file_get_contents(__DIR__ . "/public_key.pem"));
echo $key->verify($code, pack('h*', $signature)) ? 'yes' : 'no';
I also tried using PHP openssl_verify
$pub_key = file_get_contents(__DIR__ . "/public_key.pem");
$res = openssl_verify($code, pack('n*', $signature), $pub_key, OPENSSL_ALGO_SHA256);
However, it always tells me the signature is wrong, when I obviously know, that in general it is the correct signature. The RSA keys are all the correct and same keys in both python and php.
I think the issue is with the signature and how I had to convert it to a string and then back to a bytes like string in both python and php.
The Python code uses PSS.MAX_LENGTH as the salt length. This value denotes the maximum salt length and is recommended in the Cryptography documentation (s. here):
salt_length (int) – The length of the salt. It is recommended that this be set to PSS.MAX_LENGTH
In RFC8017, which specifies PKCS#1 and thus also PSS, the default value of the salt length is defined as the output length of the hash (s. A.2.3. RSASSA-PSS):
For a given hashAlgorithm, the default value of saltLength is the octet length of the hash value.
Most libraries, e.g. PHPSECLIB, apply for the default value of the salt length the default defined in RFC8017, i.e. the output length of the hash (s. here). Therefore the maximum salt length must be set explicitly. The maximum salt length is given by (s. here):
signature length (bytes) - digest output length (bytes) - 2 = 256 - 32 - 2 = 222
for a 2048 bits key and SHA256.
Thus, the verification in the PHP code must be changed as follows:
$verified = $key->
withPadding(RSA::SIGNATURE_PSS)->
//withHash('sha256')-> // default
//withMGFHash('sha256')-> // default
withSaltLength(256-32-2)-> // set maximum salt length
verify($code, pack('H*', $signature)); // alternatively hex2bin()
Note that in the posted code of the question h (hex string, low nibble first) is specified in the format string of pack(). I' ve chosen the more common H (hex string, high nibble first) in my code snippet which is also compatible with Python's hex(). Ultimately, the format string to choose depends on the encoding applied in the Python code.
Using this change, on my machine, the signature generated with the Python code can be successfully verified with the PHP code.
Alternatively, of course, the salt length of the Python code can be adapted to the output length of the digest (32 bytes in this case).
By the way, a verification with openssl_verify() is not possible, because PSS is not supported.
I'm using this to encrypt a file, and then to decrypt a file, using AES-GCM:
(do pip install pycryptodome first if not installed yet)
import Crypto.Random, Crypto.Protocol.KDF, Crypto.Cipher.AES
def cipherAES_GCM(pwd, nonce):
key = Crypto.Protocol.KDF.PBKDF2(pwd, nonce, count=100_000)
return Crypto.Cipher.AES.new(key, Crypto.Cipher.AES.MODE_GCM, nonce=nonce)
# encrypt
plaintext = b'HelloHelloHelloHelloHelloHelloHello' # in reality, read from a file
key = b'mykey'
nonce = Crypto.Random.new().read(16)
c, tag = cipherAES_GCM(key, nonce).encrypt_and_digest(plaintext)
ciphertext = nonce + tag + c # write ciphertext to disk as the "encrypted file"
# decrypt
nonce, tag, c = ciphertext[:16], ciphertext[16:32], ciphertext[32:] # read from the "encrypted file" on disk
plain = cipherAES_GCM(key, nonce).decrypt_and_verify(c, tag).decode()
print(plain) # HelloHelloHelloHelloHelloHelloHello
Is this considered a good encryption practice, and what the potential weaknesses of this file encryption implementation?
Remark: I have 10,000 files to encrypt. If each single time I encrypt a file, I call the KDF (with a high count value), this will be highly unefficient!
A better solution would be: call the KDF only once (with a nonce1), and then for each file do:
nonce2 = Crypto.Random.new().read(16)
cipher, tag = AES.new(key, AES.MODE_GCM, nonce=nonce2).encrypt_and_digest(plain)
But then does this mean I have to write nonce1 | nonce2 | ciphertext | tag to disk for each file? This adds an additional 16-byte nonce1 to each file...
A suggestion for improving your code would be to apply a 12 bytes nonce for GCM. Currently a 16 bytes nonce is used and this should be changed, see here sec. Note, and here.
Crucial for the security of GCM is that no key/nonce pair is used more than once, here. Since in your code for each encryption a random nonce is generated, this issue is prevented.
Your code applies the nonce also as salt for the key derivation, which is in principle no security problem as this does not lead to multiple use of the same key/nonce pair, here.
However, a disadvantage from this is possibly that the salt length is determined by the nonce length. If this is not desired (i.e. if e.g. a larger salt should be used), an alternative approach would be to generate a random salt for each encryption to derive both the key and nonce via the KDF, here. In this scenario, the concatenated data salt | ciphertext | tag would then be passed to the recipient. Another alternative would be to completely separate nonce and key generation and to generate for each encryption both a random nonce and a random salt for key generation. In this case the concatenated data salt | nonce | ciphertext | tag would have to be passed to the recipient. Note that like the nonce and the tag, also the salt is no secret, so that it can be sent along with the ciphertext.
The code applies an iteration count of 100,000. Generally, the following applies: The iteration count should be as high as can be tolerated for your environment, while maintaining acceptable performance, here. If 100,000 meets this criterion for your environment then this is OK.
The concatenation order you use is nonce | tag | ciphertext. This is not a problem as long as both sides know this. Often by convention, the nonce | ciphertext | tag order is used (e.g. Java implicitly appends the tag to the ciphertext), which could also be used in the code if you want to stick to this convention.
It is also important that an up-to-date, maintained library is used, which is the case with PyCryptodome (unlike its predecessor, the legacy PyCrypto, which should not be used at all).
Edit:
The PBKDF2 implementation of PyCryptodome uses by default 16 bytes for the length of the generated key, which corresponds to AES-128. For the digest HMAC/SHA1 is applied by default. The posted code uses these standard parameters, none of which are insecure, but can of course be changed if necessary, here.
Note: Although SHA1 itself is insecure, this does not apply in the context
of PBKDF2 or HMAC, here. However, to support the extinction of SHA1 from the ecosystem, SHA256 could be used.
Edit: (regarding the update of the question):
The use case presented in the edited question is the encryption of 10,000 files. The posted code is executed for each file, so that a corresponding number of keys are generated via the KDF which leads to a corresponding loss of perfomance. This is described by you as highly unefficient. However, it should not be forgotten that the current code focuses on security and less on performance. In my answer I pointed out that e.g. the iteration count is a parameter which allows tuning between performance and security within certain limits.
A PBKDF (password based key derivation function) allows to derive a key from a weak password. To keep the encryption secure, the derivation time is intentionally increased so that an attacker cannot crack the weak password faster than a strong key (ideally). If the derivation time is shortened (e.g. by decreasing the iteration count or by using the same key more than once) this generally leads to a security reduction. Or in short, a performance gain (by a faster PBKDF) generally reduces security. This results in a certain leeway for more performant (but weaker) solutions.
The more performant solution you suggest is the following: As before, a random nonce is generated for each file. But instead of encrypting each file with its own key, all files are encrypted with the same key. For this purpose, a random salt is generated once, with which this key is derived via the KDF. This does indeed mean a significant performance gain. However, this is automatically accompanied by a reduction in security: Should an attacker succeed in obtaining the key, the attacker can decrypt all files (and not just one as in the original scenario). However, this disadvantage is not a mandatory exclusion criterion if it is acceptable within the scope of your security requirements (which seems to be the case here).
The more performant solution requires that the information salt | nonce | ciphertext | tag must be sent to the recipient. The salt is important and must not be missing, because the recipient needs the salt to derive the key via the PBKDF. Once the recipient has determined the key, the ciphertext can be authenticated with the tag and decrypted using the nonce. If it has been agreed with the recipient that the same key will be used for each file, it is sufficient for the recipient to derive the key once via the PBKDF. Otherwise the key must be derived for each file.
If the salt with its 16 bytes is unwanted (since it is identical for all files in this approach), alternative architectures could be considered. For example, a hybrid scheme might be used: A random symmetric key is generated and exchanged using a public key infrastructure. Also here, all files can be encrypted with the same key or each file can be encrypted with its own key.
But for more specific suggestions for a design proposal, the use case should be described in more detail, e.g. regarding the files: How large are the files? Is processing in streams/chunks necessary? Or regarding the recipients: How many recipients are there? What is aligned with the recipients? etc.
This seems to be fine but I have a recommendation which is to not use same nonce for encryption and key derivation (nonce stands for key used only once using same nonce so you can pass the md5 hash of nonce to the encryption function instead if you dont want to use another nonce(IV). Second I think you can switch to cryptography if you are interested in better security . This is example code using cryptography module to encrypt which also has the advantage of encrypting using 128-bit key which is secure and it take care of the rest such as IV(nonces), decryption and verification(is done using HMAC). So all your code above can be summarized in this few lines which lead to less complexity so arguably more secure code.
from cryptography.fernet import Fernet
plaintext = b"hello world"
key = Fernet.generate_key()
ctx = Fernet(key)
ciphertext = ctx.encrypt(plaintext)
print(ciphertext)
decryption = ctx.decrypt(ciphertext)
print(decryption)
EDIT: Note that the nonce you use will also weaken up the key since the nonce is sent with ciphertext, now the salt used for PBKDF is pointless and now the attacker have to just guess your password(assuming using default count) which in this case is very simple one, brute forcing can take no longer than 26^5 tries(total of lowercase alphabets for total of length 5).
I want to be able to generate and re-generate the same RSA keys from a password (and salt) alone in python.
Currently I was doing it using pycrypto, however, it does not seem to generate the same exact keys from the password alone. The reason seems to be that when pycrypto generates a RSA key it uses some sort of random number internally.
Currently my code looks as follows:
import DarkCloudCryptoLib as dcCryptoLib #some costume library for crypto
from Crypto.PublicKey import RSA
password = "password"
new_key1 = RSA.generate(1024) #rsaObj
exportedKey1 = new_key1.exportKey('DER', password, pkcs=1)
key1 = RSA.importKey(exportedKey1)
new_key2 = RSA.generate(1024) #rsaObj
exportedKey2 = new_key2.exportKey('DER', password, pkcs=1)
key2 = RSA.importKey(exportedKey2)
print dcCryptoLib.equalRSAKeys(key1, key2) #wish to return True but it doesn't
I don't really care if I have to not use pycrypto, as long as I can generate these RSA keys from passwords and salts alone.
Thanks for the help in advance.
Just for reference, this is how dcCryptoLib.equalRSAKeys(key1, key2) function looks like:
def equalRSAKeys(rsaKey1, rsaKey2):
public_key = rsaKey1.publickey().exportKey("DER")
private_key = rsaKey1.exportKey("DER")
pub_new_key = rsaKey2.publickey().exportKey("DER")
pri_new_key = rsaKey2.exportKey("DER")
boolprivate = (private_key == pri_new_key)
boolpublic = (public_key == pub_new_key)
return (boolprivate and boolpublic)
NOTE: Also, I am only using RSA for authentication. So any solution that provides a way of generating secure asymmetric signatures/verifying generated from passwords are acceptable solutions for my application. Though, generating RSA keys from passwords I feel, is a question that should also be answered as it seems useful if used correctly.
If you're trying to implement an authenticated encryption scheme using a shared password, you don't really need an RSA key: all you need is an AES key for encryption and an HMAC key for authentication.
If you do need to generate an asymmetric signature than can be verified without knowing the password, you're going to have to somehow generate RSA (or DSA, etc.) keys in a deterministic manner based on the password. Based on the documentation, this should be possible by defining a custom randfunc, something like this:
from Crypto.Protocol.KDF import PBKDF2
from Crypto.PublicKey import RSA
password = "swordfish" # for testing
salt = "yourAppName" # replace with random salt if you can store one
master_key = PBKDF2(password, salt, count=10000) # bigger count = better
def my_rand(n):
# kluge: use PBKDF2 with count=1 and incrementing salt as deterministic PRNG
my_rand.counter += 1
return PBKDF2(master_key, "my_rand:%d" % my_rand.counter, dkLen=n, count=1)
my_rand.counter = 0
RSA_key = RSA.generate(2048, randfunc=my_rand)
I've tested this, and it does generate deterministic RSA keys (as long as you remember to reset the counter, at least). However, note that this is not 100% future-proof: the generated keys might change, if the pycrypto RSA key generation algorithm is changed in some way.
In either case, you'll almost certainly want to preprocess your password using a slow key-stretching KDF such as PBKDF2, with an iteration count as high as you can reasonably tolerate. This makes breaking your system by brute-force password guessing considerably less easy. (Of course, you still need to use strong passwords; no amount of key-stretching is going to help if your password is abc123.)
Pass "randfunc" to the RSA.generate, and randfunc should return the output bytes, in order, of a well-known key derivation function that has been configured with enough output bits for RSA to "always complete" without needing more bits.
Argon2, scrypt, PBKDF2 are examples of KDFs designed for this purpose.
It may be possible to use Keccak directly as a KDF by specifying a high number of output bits.
If your generation function follows a well known standard closely, it should work across multiple implementations.