Python Password Hashing with hashlib

Python Password Hashing with hashlib - python

I'm trying to hash a password for a login system I am creating. I am using the hashlib import and using the blake2b hash algorithm. I can't seem to figure out how to hash a variable such as passwordEntry. All the hashlib examples are just of blake2b hashing characters. For example: blake2b(b'IWantToHashThis') I am quite confused on why the "b" letter has to be included in the hash. If I try to hash a variable the "b" letter can't be concluded with the variable I want to hash. Example of me trying to hash a variable: blake2b(passwordEntry) Another example of me trying to hash the variable: blake2b(b passwordEntry) On the second example I just gave hashlib thinks that it is trying to hash the variable "b passwordEntry." Like I said before the "b" letter has to be included in the hashing algorithm for it to preform correctly. Sorry for the long question if it is hard to follow I understand.

The letter b only works before quotes, [", ', """, ''''].
And it is there to notate that this string is bytes.
If you want to convert your string to bytes you can do that by
b"string" or "string".encode(). However, in your case you can only use the encode() method of str since b only works for Literal Strings.
So in your case it will be blake2b(passwordEntry.encode())

Related

After receiving the input in hexadecimal I would like to know how to output the SHA1 hash function

If 'a' is hashed,
import hashlib
hash = hashlib.sha1(b'aa')
hex_Hash = hash.hexdigest()
print(hex_Hash)
I wrote the above code, but the desired result (38469e8ea8e72d0b889f1905195e2f4b79b5bb50) does not come out. How should I write the code?

I have no idea where you got adc83b...fc from, but it is the sha-1 hash of the byte string containing a single newline: b'\n'.
>>> hashlib.sha1(b'\n').hexdigest()
'adc83b19e793491b1c6ea0fd8b46cd9f32e592fc'
========
Based on the comments below, it seems the original poster is actually asking about. The hexdigest was just intended to be an example of the reverse of what they were looking for. I think.
bytearray.fromhex('aa')

Reusing hashlib.md5 calculates different values for identical strings

This is my first test code:
import hashlib
md5Hash = hashlib.md5()
md5Hash.update('Coconuts')
print md5Hash.hexdigest()
md5Hash.update('Apples')
print md5Hash.hexdigest()
md5Hash.update('Oranges')
print md5Hash.hexdigest()
And this is my second chunk of code:
import hashlib
md5Hash = hashlib.md5()
md5Hash.update('Coconuts')
print md5Hash.hexdigest()
md5Hash.update('Bananas')
print md5Hash.hexdigest()
md5Hash.update('Oranges')
print md5Hash.hexdigest()
But the output for 1st code is:
0e8f7761bb8cd94c83e15ea7e720852a
217f2e2059306ab14286d8808f687abb
4ce7cfed2e8cb204baeba9c471d48f07
And for the second code is:
0e8f7761bb8cd94c83e15ea7e720852a
a82bf69bf25207f2846c015654ae68d1
47dba619e1f3eaa8e8a01ab93c79781e
I replaced the second string from 'Apples' to 'Bananas' and the third string still remains same. But still I am getting a different result for third string. Hashing supposed to have a same result everytime.
Am I missing something?

hashlib.md5.update() adds data to the hash. It doesn't replace the existing values; if you want to hash a new value, you need to initialize a new hashlib.md5 object.
The values you're hashing are:
"Coconuts" -> 0e8f7761bb8cd94c83e15ea7e720852a
"CoconutsApples" -> 217f2e2059306ab14286d8808f687abb
"CoconutsApplesOranges" -> 4ce7cfed2e8cb204baeba9c471d48f07
"Coconuts" -> 0e8f7761bb8cd94c83e15ea7e720852a
"CoconutsBananas" -> a82bf69bf25207f2846c015654ae68d1
"CoconutsBananasOranges" -> 47dba619e1f3eaa8e8a01ab93c79781e

Because you're using update method, md5Hash object is reused for the 3 strings. So it's basically the hash of the 3 strings concatenated together. So changing the second string changes the outcome for the 3rd print as well.
You need to declare a separate md5 object for each string. Use a loop (and python 3 compliant code needs the bytes prefix BTW, and also works in python 2):
import hashlib
for s in (b'Coconuts',b'Bananas',b'Oranges'):
md5Hash = hashlib.md5(s) # no need for update, pass data at construction
print(md5Hash.hexdigest())
result:
0e8f7761bb8cd94c83e15ea7e720852a
1ee31b77d0697c36914b99d1428f7f32
62f2b77089fea4c595e895901b63c10b
note that the values are now different, but at least it is the MD5 of each string, computed independently.

Expected result
What you are expecting is generally what you should be expecting from common cryptographic libraries. In most cryptographic libraries the hash object is reset after calling a method that finalizes the calculation such as hexdigest. It seems that hashlib.md5 uses alternate behavior.
Result by hashlib.md5
MD5 requires the input to be padded with a 1 bit, zero or more 0 bits and the length of the input in bits. Then the final hash value is calculated. hashlib.md5 internally seems to perform the final calculation using separate variables, keeping the state after hashing each string without this final padding.
So the result of your hashes is the concatenation of the earlier strings with the given string, followed by the correct padding, as duskwulf pointed out in his answer.
This is correctly documented by hashlib:
hash.digest()
Return the digest of the strings passed to the update() method so far. This is a string of digest_size bytes which may contain non-ASCII characters, including null bytes.
and
hash.hexdigest()
Like digest() except the digest is returned as a string of double length, containing only hexadecimal digits. This may be used to exchange the value safely in email or other non-binary environments.
Solution for hashlib.md5
As there doesn't seem to be a reset() method you should create a new md5 object for each separate hash value you want to create. Fortunately the hash objects themselves are relatively lightweight (even if the hashing itself isn't) so this won't consume many CPU or memory resources.
Discussion of the differences
For hashing itself resetting the hash in the finalizer may not make all that much sense. But it does matter for signature generation: you might want to initialize the same signature instance and then generate multiple signatures with it. The hash function should reset so it can calculate the signature over multiple messages.
Sometimes an application requires a congregated hash over multiple inputs, including intermediate hash results. In that case however a Merkle tree of hashes is used, where the intermediate hashes themselves are hashed again.
As indicated, I consider this is bad API design by the authors of hashlib. For cryptographers it certainly doesn't follow the rule of least surprise.

Comparing Python Hashes

I want to compare a hash of my password to a hash of what the user typed in, with (str)(hashlib.md5(pw.encode('utf-8')).hexdigest()).
The hash of the password is b'¥_ÆMÐ1;2±*öªÝ='. However, when I run the above code, I get b'\xa5\x83_\xc6\x85M\xd01;2\xb1*\xf6\xaa\xdd='.
For this reason, I can't compare these two strings. I'm looking for a function that can convert b'\xa5\x83_\xc6\x85M\xd01;2\xb1*\xf6\xaa\xdd=' to b'¥_ÆMÐ1;2±*öªÝ=' logically (each of the escape codes to its Unicode counterpart).
(The hash is of "lenny" if it helps. Here is a link to my code.)

Use .digest() instead of .hexdigest() if you want the raw bytes from the hash context.
edit, line 14 from your pastebin should be:
if hashlib.md5(lol.encode('utf-8')).digest() == b'\xa5\x83_\xc6\x85M\xd01;2\xb1*\xf6\xaa\xdd=':

Verify Python Passlib generated PBKDF2 SHA512 Hash in .NET

I am migrating a platform which used Passlib 1.6.2 to generate password hashes. The code to encrypt the password is (hash is called with default value for rounds):
from passlib.hash import pbkdf2_sha512 as pb
def hash(cleartext, rounds=10001):
return pb.encrypt(cleartext, rounds=rounds)
The output format looks like (for the password "Patient3" (no quotes)):
$pbkdf2-sha512$10001$0dr7v7eWUmptrfW.9z6HkA$w9j9AMVmKAP17OosCqDxDv2hjsvzlLpF8Rra8I7p/b5746rghZ8WrgEjDpvXG5hLz1UeNLzgFa81Drbx2b7.hg
And "Testing123"
$pbkdf2-sha512$10001$2ZuTslYKAYDQGiPkfA.B8A$ChsEXEjanEToQcPJiuVaKk0Ls3n0YK7gnxsu59rxWOawl/iKgo0XSWyaAfhFV0.Yu3QqfehB4dc7yGGsIW.ARQ
I can see that represents:
Algorithm SHA512
Iterations 10001
Salt 0dr7v7eWUmptrfW.9z6HkA (possibly)
The Passlib algorithm is defined on their site and reads:
All of the pbkdf2 hashes defined by passlib follow the same format, $pbkdf2-digest$rounds$salt$checksum.
$pbkdf2-digest$ is used as the Modular Crypt Format identifier ($pbkdf2-sha256$ in the example).
digest - this specifies the particular cryptographic hash used in conjunction with HMAC to form PBKDF2’s pseudorandom function for that particular hash (sha256 in the example).
rounds - the number of iterations that should be performed. this is encoded as a positive decimal number with no zero-padding (6400 in the example).
salt - this is the adapted base64 encoding of the raw salt bytes passed into the PBKDF2 function.
checksum - this is the adapted base64 encoding of the raw derived key bytes returned from the PBKDF2 function. Each scheme uses the digest size of its specific hash algorithm (digest) as the size of the raw derived key. This is enlarged by approximately 4/3 by the base64 encoding, resulting in a checksum size of 27, 43, and 86 for each of the respective algorithms listed above.
I found passlib.net which looks a bit like an abandoned beta and it uses '$6$' for the algorithm. I could not get it to verify the password. I tried changing the algorithm to $6$ but I suspect that in effect changes the salt as well.
I also tried using PWDTK with various values for salt and hash, but it may have been I was splitting the shadow password incorrectly, or supplying $ in some places where I should not have been.
Is there any way to verify a password against this hash value in .NET? Or another solution which does not involve either a Python proxy or getting users to resupply a password?

The hash is verified by passing the password into the PBKDF HMAC-SHA-256 hash method and then comparing the resulting hash to the saved hash portion, converted back from the Base64 version.
Saved hash to binary, then separate the hash
Convert the password to binary using UTF-8 encoding
PBKDF2,HMAC,SHA-256(toBinary(password, salt, 10001) == hash
Password: "Patient3"
$pbkdf2 - sha512$10001$0dr7v7eWUmptrfW.9z6HkA$w9j9AMVmKAP17OosCqDxDv2hjsvzlLpF8Rra8I7p/b5746rghZ8WrgEjDpvXG5hLz1UeNLzgFa81Drbx2b7.hg
Breaks down to (with the strings converted to standard Base64 (change '.' to '+' and add trailing '=' padding:
pbkdf2 - sha512
10001
0dr7v7eWUmptrfW+9z6HkA==
w9j9AMVmKAP17OosCqDxDv2hjsvzlLpF8Rra8I7p/b5746rghZ8WrgEjDpvXG5hLz1UeNLzgFa81Drbx2b7+hg==
Decoded to hex:
D1DAFBBFB796526A6DADF5BEF73E8790
C3D8FD00C5662803F5ECEA2C0AA0F10EFDA18ECBF394BA45F11ADAF08EE9FDBE7BE3AAE0859F16AE01230E9BD71B984BCF551E34BCE015AF350EB6F1D9BEFE86
Which makes sense: 16-byte (128-bit) salt and 64-byte (512-bit) SHA-512 hash.
Converting "Patient3" using UTF-8 to a binary array
Converting the salt from a modified BASE64 encoding to a 16 byte binary array
Using an iteration count od 10001
Feeding this to PBKDF2 using HMAC with SHA-512
I get
C3D8FD00C5662803F5ECEA2C0AA0F10EFDA18ECBF394BA45F11ADAF08EE9FDBE7BE3AAE0859F16AE01230E9BD71B984BCF551E34BCE015AF350EB6F1D9BEFE86
Which when Base64 encoded, replacing '+' characters with '.' and stripping the trailing '=' characters returns:
w9j9AMVmKAP17OosCqDxDv2hjsvzlLpF8Rra8I7p/b5746rghZ8WrgEjDpvXG5hLz1UeNLzgFa81Drbx2b7.hg

I quickly knocked together a .NET implementation using zaph's logic and using the code from JimmiTh on SO answer. I have put the code on GitHub (this is not supposed to be production ready). It appears to work with more than a handful of examples from our user base.
As zaph said the logic was:
Split the hash to find the iteration count, salt and hashed password. (I have assumed the algorithm, but you'd verify it). You'll have an array of 5 values containing [0] - Nothing, [1] - Algorithm, [2] - Iterations, [3] - Salt and [4] - Hash
Turn the salt into standard Base64 encoding by replacing any '.' characters with '+' characters and appending "==".
Pass the password, salt and iteration count to the PBKDF2-HMAC-SHA512 generator.
Convert back to the original base64 format by replacing any '+' characters with '.' characters and stripping the trailing "==".
Compare to the original hash (element 4 in the split string) to this converted value and if they're equal you've got a match.

MD5 Encrypting Woes

I've written an encrypting program in Python, one of my options is an md5 encryption. When i run a known string through my md5 encryption I receive a different hash value then if I run the EXACT same string through an md5 encryption website or cryptofox for firefox.
eg. my programs hash output - fe9c25d61e56054ea87703e30c672d91 - plaintext: g4m3
eg. online hash / cryptofox - 26e4477a0fa9cb24675379331dba9c84 - plaintext: g4m3
EXACT same word, 2 different hash values.
now heres my code snipet:
word="g4m3"
string=md5.new(word).hexdigest()
print string

You included a newline in your MD5 input string:
>>> import md5
>>> word="g4m3"
>>> md5.new(word).hexdigest() # no newline
'26e4477a0fa9cb24675379331dba9c84'
>>> md5.new(word + '\n').hexdigest() # with a newline
'fe9c25d61e56054ea87703e30c672d91'
When reading data from a file, make sure you remove the newline character at the end of the lines. You can use .rstrip('\n') to just remove line separation characters from the end of the line, or use .strip() to remove all whitespace from start or end of the line:
>>> word = 'g4m3\n'
>>> md5.new(word).hexdigest()
'fe9c25d61e56054ea87703e30c672d91'
>>> word = word.strip()
>>> md5.new(word).hexdigest()
'26e4477a0fa9cb24675379331dba9c84'

As to your question: Hashing is very sensitive. Even a single character of difference can result in a radically different output string. It may be the case that the online implementation is appending a whitespace char, or more likely, a newline. This extra character will change the output of the algorithm. (It's also possible the opposite is happening: you are appending a newline and the online one is not)
As to MD5 "encryption":
MD5 is NOT encryption. It is hashing. Encryption takes in data and a key, and spits out random data. This data is recoverable. Hashing, on the other hand, takes in data and spits out a finite amount of data that REPRESENTS the original data. The original data, however, unless stored elsewhere, is lost.
More information for reference:
Another interesting difference is the data the various types of algorithms spit out. Encryption can take in any amount of data (within the scope of the OS/software of course) and will output a bunch of data appx. equal in size to the input data. Hashing, however, will not. Since it is a mere representation of the data, it has a limited output. This can pose problems. For instance, if you had an infinite amount of data, eventually, two entirely different pieces of data would have the same hash. For this reason, when using hashing to compare two different values, it is usually a good idea to compare two separate hashes as well. The statistical probability that two separate pieces of data having TWO EQUAL HASHES is astronomically low.
Of course, then you get into hashing algorithms that utilize encryption methods at their core, but I won't go into that here.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.