I want to compare a hash of my password to a hash of what the user typed in, with (str)(hashlib.md5(pw.encode('utf-8')).hexdigest()).
The hash of the password is b'¥_ÆMÐ1;2±*öªÝ='. However, when I run the above code, I get b'\xa5\x83_\xc6\x85M\xd01;2\xb1*\xf6\xaa\xdd='.
For this reason, I can't compare these two strings. I'm looking for a function that can convert b'\xa5\x83_\xc6\x85M\xd01;2\xb1*\xf6\xaa\xdd=' to b'¥_ÆMÐ1;2±*öªÝ=' logically (each of the escape codes to its Unicode counterpart).
(The hash is of "lenny" if it helps. Here is a link to my code.)
Use .digest() instead of .hexdigest() if you want the raw bytes from the hash context.
edit, line 14 from your pastebin should be:
if hashlib.md5(lol.encode('utf-8')).digest() == b'\xa5\x83_\xc6\x85M\xd01;2\xb1*\xf6\xaa\xdd=':
Related
If 'a' is hashed,
import hashlib
hash = hashlib.sha1(b'aa')
hex_Hash = hash.hexdigest()
print(hex_Hash)
I wrote the above code, but the desired result (38469e8ea8e72d0b889f1905195e2f4b79b5bb50) does not come out. How should I write the code?
I have no idea where you got adc83b...fc from, but it is the sha-1 hash of the byte string containing a single newline: b'\n'.
>>> hashlib.sha1(b'\n').hexdigest()
'adc83b19e793491b1c6ea0fd8b46cd9f32e592fc'
========
Based on the comments below, it seems the original poster is actually asking about. The hexdigest was just intended to be an example of the reverse of what they were looking for. I think.
bytearray.fromhex('aa')
I'm trying to hash a password for a login system I am creating. I am using the hashlib import and using the blake2b hash algorithm. I can't seem to figure out how to hash a variable such as passwordEntry. All the hashlib examples are just of blake2b hashing characters. For example: blake2b(b'IWantToHashThis') I am quite confused on why the "b" letter has to be included in the hash. If I try to hash a variable the "b" letter can't be concluded with the variable I want to hash. Example of me trying to hash a variable: blake2b(passwordEntry) Another example of me trying to hash the variable: blake2b(b passwordEntry) On the second example I just gave hashlib thinks that it is trying to hash the variable "b passwordEntry." Like I said before the "b" letter has to be included in the hashing algorithm for it to preform correctly. Sorry for the long question if it is hard to follow I understand.
The letter b only works before quotes, [", ', """, ''''].
And it is there to notate that this string is bytes.
If you want to convert your string to bytes you can do that by
b"string" or "string".encode(). However, in your case you can only use the encode() method of str since b only works for Literal Strings.
So in your case it will be blake2b(passwordEntry.encode())
This is my first test code:
import hashlib
md5Hash = hashlib.md5()
md5Hash.update('Coconuts')
print md5Hash.hexdigest()
md5Hash.update('Apples')
print md5Hash.hexdigest()
md5Hash.update('Oranges')
print md5Hash.hexdigest()
And this is my second chunk of code:
import hashlib
md5Hash = hashlib.md5()
md5Hash.update('Coconuts')
print md5Hash.hexdigest()
md5Hash.update('Bananas')
print md5Hash.hexdigest()
md5Hash.update('Oranges')
print md5Hash.hexdigest()
But the output for 1st code is:
0e8f7761bb8cd94c83e15ea7e720852a
217f2e2059306ab14286d8808f687abb
4ce7cfed2e8cb204baeba9c471d48f07
And for the second code is:
0e8f7761bb8cd94c83e15ea7e720852a
a82bf69bf25207f2846c015654ae68d1
47dba619e1f3eaa8e8a01ab93c79781e
I replaced the second string from 'Apples' to 'Bananas' and the third string still remains same. But still I am getting a different result for third string. Hashing supposed to have a same result everytime.
Am I missing something?
hashlib.md5.update() adds data to the hash. It doesn't replace the existing values; if you want to hash a new value, you need to initialize a new hashlib.md5 object.
The values you're hashing are:
"Coconuts" -> 0e8f7761bb8cd94c83e15ea7e720852a
"CoconutsApples" -> 217f2e2059306ab14286d8808f687abb
"CoconutsApplesOranges" -> 4ce7cfed2e8cb204baeba9c471d48f07
"Coconuts" -> 0e8f7761bb8cd94c83e15ea7e720852a
"CoconutsBananas" -> a82bf69bf25207f2846c015654ae68d1
"CoconutsBananasOranges" -> 47dba619e1f3eaa8e8a01ab93c79781e
Because you're using update method, md5Hash object is reused for the 3 strings. So it's basically the hash of the 3 strings concatenated together. So changing the second string changes the outcome for the 3rd print as well.
You need to declare a separate md5 object for each string. Use a loop (and python 3 compliant code needs the bytes prefix BTW, and also works in python 2):
import hashlib
for s in (b'Coconuts',b'Bananas',b'Oranges'):
md5Hash = hashlib.md5(s) # no need for update, pass data at construction
print(md5Hash.hexdigest())
result:
0e8f7761bb8cd94c83e15ea7e720852a
1ee31b77d0697c36914b99d1428f7f32
62f2b77089fea4c595e895901b63c10b
note that the values are now different, but at least it is the MD5 of each string, computed independently.
Expected result
What you are expecting is generally what you should be expecting from common cryptographic libraries. In most cryptographic libraries the hash object is reset after calling a method that finalizes the calculation such as hexdigest. It seems that hashlib.md5 uses alternate behavior.
Result by hashlib.md5
MD5 requires the input to be padded with a 1 bit, zero or more 0 bits and the length of the input in bits. Then the final hash value is calculated. hashlib.md5 internally seems to perform the final calculation using separate variables, keeping the state after hashing each string without this final padding.
So the result of your hashes is the concatenation of the earlier strings with the given string, followed by the correct padding, as duskwulf pointed out in his answer.
This is correctly documented by hashlib:
hash.digest()
Return the digest of the strings passed to the update() method so far. This is a string of digest_size bytes which may contain non-ASCII characters, including null bytes.
and
hash.hexdigest()
Like digest() except the digest is returned as a string of double length, containing only hexadecimal digits. This may be used to exchange the value safely in email or other non-binary environments.
Solution for hashlib.md5
As there doesn't seem to be a reset() method you should create a new md5 object for each separate hash value you want to create. Fortunately the hash objects themselves are relatively lightweight (even if the hashing itself isn't) so this won't consume many CPU or memory resources.
Discussion of the differences
For hashing itself resetting the hash in the finalizer may not make all that much sense. But it does matter for signature generation: you might want to initialize the same signature instance and then generate multiple signatures with it. The hash function should reset so it can calculate the signature over multiple messages.
Sometimes an application requires a congregated hash over multiple inputs, including intermediate hash results. In that case however a Merkle tree of hashes is used, where the intermediate hashes themselves are hashed again.
As indicated, I consider this is bad API design by the authors of hashlib. For cryptographers it certainly doesn't follow the rule of least surprise.
I'm new to encryption, and programming in general. I'm just trying to get my head wrapped around some basic concepts.
I'm using python, Crypto.Hash.SHA256
from Crypto.Hash import SHA256
In the REPL if I type
print SHA256.new('password').digest()//j���*�rBo��)'s`=
vs
SHA256.new('password').digest()//"^\x88H\x98\xda(\x04qQ\xd0\xe5o\x8d\xc6)'s`=\rj\xab\xbd\xd6*\x11\xefr\x1d\x15B\xd8"
What are these two outputs?
How are they supposed to be interpreted?
In the first case, you are using print, so Python is trying to convert the bytes to printable characters. Unfortunately, not every byte is printable, so you get some strange output.
In the second case, since you are not calling print, the Python interpreter does something different. It takes the return value, which is a string in this case, and shows the internal representation of the string. Which is why for some characters, you get something that is printable, but in other cases, you get an escaped sequence, like \x88.
The two outputs happen to just be two representations of the same digest.
FYI, when working with pycrypto and looking at hash function outputs, I highly recommend using hexdigest instead of digest.
I've written an encrypting program in Python, one of my options is an md5 encryption. When i run a known string through my md5 encryption I receive a different hash value then if I run the EXACT same string through an md5 encryption website or cryptofox for firefox.
eg. my programs hash output - fe9c25d61e56054ea87703e30c672d91 - plaintext: g4m3
eg. online hash / cryptofox - 26e4477a0fa9cb24675379331dba9c84 - plaintext: g4m3
EXACT same word, 2 different hash values.
now heres my code snipet:
word="g4m3"
string=md5.new(word).hexdigest()
print string
You included a newline in your MD5 input string:
>>> import md5
>>> word="g4m3"
>>> md5.new(word).hexdigest() # no newline
'26e4477a0fa9cb24675379331dba9c84'
>>> md5.new(word + '\n').hexdigest() # with a newline
'fe9c25d61e56054ea87703e30c672d91'
When reading data from a file, make sure you remove the newline character at the end of the lines. You can use .rstrip('\n') to just remove line separation characters from the end of the line, or use .strip() to remove all whitespace from start or end of the line:
>>> word = 'g4m3\n'
>>> md5.new(word).hexdigest()
'fe9c25d61e56054ea87703e30c672d91'
>>> word = word.strip()
>>> md5.new(word).hexdigest()
'26e4477a0fa9cb24675379331dba9c84'
As to your question: Hashing is very sensitive. Even a single character of difference can result in a radically different output string. It may be the case that the online implementation is appending a whitespace char, or more likely, a newline. This extra character will change the output of the algorithm. (It's also possible the opposite is happening: you are appending a newline and the online one is not)
As to MD5 "encryption":
MD5 is NOT encryption. It is hashing. Encryption takes in data and a key, and spits out random data. This data is recoverable. Hashing, on the other hand, takes in data and spits out a finite amount of data that REPRESENTS the original data. The original data, however, unless stored elsewhere, is lost.
More information for reference:
Another interesting difference is the data the various types of algorithms spit out. Encryption can take in any amount of data (within the scope of the OS/software of course) and will output a bunch of data appx. equal in size to the input data. Hashing, however, will not. Since it is a mere representation of the data, it has a limited output. This can pose problems. For instance, if you had an infinite amount of data, eventually, two entirely different pieces of data would have the same hash. For this reason, when using hashing to compare two different values, it is usually a good idea to compare two separate hashes as well. The statistical probability that two separate pieces of data having TWO EQUAL HASHES is astronomically low.
Of course, then you get into hashing algorithms that utilize encryption methods at their core, but I won't go into that here.