Are MD5 digests consistent across different systems? - python

Given this code:
import hashlib
h = hashlib.md5()
h.update(str("foobar").encode('utf-8'))
Would the same digest be returned on any system?
If not, is there a way to ensure that it does?

Yes, a correctly implemented MD5 algorithm will always produce the same digest for the same series of bytes on any system, since that is exactly what checksum algorithms like MD5 are for.
As a side note: "foobar" is already a string, so str("foobar") is doing nothing at all in your code above. A correct version of the code would be:
import hashlib
h = hashlib.md5()
h.update("foobar".encode('utf-8'))
Also, note that MD5 is not a secure cryptographic hash function. It's fine to use it as a checksum to guard against accidental corruption, but it cannot be used to verify that data hasn't been intentionally altered.

Related

python symmetric encryption to binary without timestamp

I want to encrypt a .zip file using AES256 in Python. I am aware of the Python cryptography module, in particular the example given at:
https://cryptography.io/en/latest/fernet/
However, I have needs that are a bit different:
I want to output binary data (because I want a small encrypted file). How can I output in binary instead of armored ASCII?
I do not want to have the plaintext timestamp. Any way to remove it?
If I cannot fix those points I will use another method. Any suggestions? I was considering issuing gpg commands through subprocess.
Looking at Fernet module, seems it encrypts and authenticates the data. Actually its safer than only encrypting (see here). However, removing the timestamp, in the case of this module, doesn't make sense if you also want to authenticate.
Said that, seems you want to risky and only encrypt instead of encrypt and authenticate. You might follow the examples of the same module found at https://cryptography.io/en/latest/hazmat/primitives/symmetric-encryption/. Just make sure this is what you really want.
As you're worried about size and want to use AES, you could try AES in CTR mode, which does not need padding, avoiding extra bytes at the end.
import os
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
from cryptography.hazmat.backends import default_backend
backend = default_backend()
key = os.urandom(32)
nonce = os.urandom(16)
cipher = Cipher(algorithms.AES(key), modes.CTR(nonce), backend=backend)
encryptor = cipher.encryptor()
ct = encryptor.update(b"a secret message") + encryptor.finalize()
print(ct)
decryptor = cipher.decryptor()
print(decryptor.update(ct) + decryptor.finalize())
So, answering your questions:
(1) The update method already returns a byte array.
(2) This way there will be no plaintext data automatically appended to the ciphertext (but be aware of the security implications about not authenticating the data). However, you'll need to pass the IV anyway, what you would have to do in either case.

password encryption on python?

I just create a new aplication in python for registration.
I save all the fields in database and user created successfully but the password is save same as it is we filled at the time of registration.
How do I encrypt or use default functonality of python for password.
Please suggest me?
To make offline password cracking more expensive, you could use bcrypt.
If you are limited to the stdlib, there is crypt module on Unix:
hashed = crypt.crypt(plaintext)
you should hash the passwords, the following code hashes the raw-input password according to your PASSWORD_HASHERS in settings.py
from django.contrib.auth.hashers import make_password
pass = make_password(raw_pass) # hashing is done here
user.set_password(pass)
Don't implement such stuff yourself or you likely will do it wrong.
For password storage, using some reversible encoding or symetric encryption or a simple hash from hashlib or even a randomly salted hash are all major FAILURES nowadays.
If you are using django, use some strong algorithm provided by django (usually one of: bcrypt, pbkdf2, sha512_crypt).
When not using django: use passlib - after reading its documentation.
http://code.google.com/p/passlib/
Hash the password upon getting it from the user (and on registration) to encrypt it.
import hashlib
m = hashlib.sha1()
m.update("My users' password here")
m.digest()
Ref: http://docs.python.org/2/library/hashlib.html#module-hashlib
For actual encryption, you can try M2Crypto or PyCrypto. Those are probably what you are looking for; however, there are other ways to obfuscate your passwords for the average user:
(if you would like to read some more answers as to what encryption method might suit you best, check out this somewhat related SO post: Recommended Python cryptographic module?
hashlib will provide various hash algorithms (ex. "SHA1, SHA224, SHA256, SHA384, and SHA512"). A simple example:
import hashlib
enc = hashlib.md5()
enc.update("Somerandompassword")
print enc.hexdigest()
And this will print you the md5 "Somerandompassword":
c5532f9e756b4583db4c627c8aa7d303
However, for (base64) encoding, for example, try:
import base64
enc = base64.b64encode("Somerandompassword")
and decoding
dec = base64.b64decode("U29tZXJhbmRvbXBhc3N3b3Jk")
print dec
will print: Somerandompassword

bcrypt passphrase with multiple rounds of AES encryption?

I'm trying to write a simple Python solution to encrypt a file securely using a passphrase. I figured I would use something like bcrypt or pbkdf2 so that as time goes on, I could make my password hashes more and more difficult to brute force. I also figured I would use AES for the actual encryption, as it's a pretty safe standard. I'm not fixed on the encryption cipher, but I really like bcrypt.
I'm having quite a difficult time figuring out how to actually perform the encryption. Let's say I have a passphrase and a file I'd like to encrypt. I'd assume that I essentially need to do something like this:
from Crypto.Cipher import AES
from bcrypt import gensalt, hashpw
from hashlib import sha256
def encryptify(passphrase, file_name):
target_file = open(file_name, 'r')
# generate password, takes time
passphrase_rounds = 15
passphrase_salt = gensalt(rounds)
passphrase = sha256(hashpw(passphrase, passphrase_salt)).hexdigest()
# encrypt the file
encrypted_file = AES.new(passphrase, AES.MODE_CBC).encrypt(target_file.read())
At the final step, it fails with a ValueError, telling me that my key must be 16, 24, or 32 bytes long. What I'm not understanding is if what I'm doing is secure and why the last step is failing. I thought that SHA256 outputs 32 characters of data?
I'm particularly concerned about taking a bcrypt passphrase and throwing it through sha256, are there any potential security risks by doing this? I wouldn't imagine so, but then again, I'm not a cryptographer.
I can't comment about safety, but if you want your actual 32 bytes of SHA256, you need to call digest, not hexdigest. hexdigest returns a hexadecimal string representation (that would be 64 characters).

MD5 and SHA-2 collisions in Python

I'm writing a simple MP3 cataloguer to keep track of which MP3's are on my various devices. I was planning on using MD5 or SHA2 keys to identify matching files even if they have been renamed/moved, etc. I'm not trying to match MP3's that are logically equivalent (i.e.: same song but encoded differently). I have about 8000 MP3's. Only about 6700 of them generated unique keys.
My problem is that I'm running into collisions regardless of the hashing algorithm I choose. In one case, I have two files that happen to be tracks #1 and #2 on the same album, they are different file sizes yet produce identical hash keys whether I use MD5, SHA2-256, SHA2-512, etc...
This is the first time I'm really using hash keys on files and this is an unexpected result. I feel something fishy is going on here from the little I know about these hashing algorithms. Could this be an issue related to MP3's or Python's implementation?
Here's the snippet of code that I'm using:
data = open(path, 'r').read()
m = hashlib.md5(data)
m.update(data)
md5String = m.hexdigest()
Any answers or insights to why this is happening would be much appreciated. Thanks in advance.
--UPDATE--:
I tried executing this code in linux (with Python 2.6) and it did not produce a collision. As demonstrated by the stat call, the files are not the same. I also downloaded WinMD5 and this did not produce a collision(8d327ef3937437e0e5abbf6485c24bb3 and 9b2c66781cbe8c1be7d6a1447994430c). Is this a bug with Python hashlib on Windows? I tried the same under Python 2.7.1 and 2.6.6 and both provide the same result.
import hashlib
import os
def createMD5( path):
fh = open(path, 'r')
data = fh.read()
m = hashlib.md5(data)
md5String = m.hexdigest()
fh.close()
return md5String
print os.stat(path1)
print os.stat(path2)
print createMD5(path1)
print createMD5(path2)
>>> nt.stat_result(st_mode=33206, st_ino=0L, st_dev=0, st_nlink=0, st_uid=0, st_gid=0, st_size=6617216L, st_atime=1303808346L, st_mtime=1167098073L, st_ctime=1290222341L)
>>> nt.stat_result(st_mode=33206, st_ino=0L, st_dev=0, st_nlink=0, st_uid=0, st_gid=0, st_size=4921346L, st_atime=1303808348L, st_mtime=1167098076L, st_ctime=1290222341L)
>>> a7a10146b241cddff031eb03bd572d96
>>> a7a10146b241cddff031eb03bd572d96
I sort of have the feeling that you are reading a chunk of data which is smaller than the expected, and this chunk happens to be the same for both files. I don't know why, but try to open the file in binary with 'rb'. read() should read up to end of file, but windows behaves differently. From the docs
On Windows, 'b' appended to the mode
opens the file in binary mode, so
there are also modes like 'rb', 'wb',
and 'r+b'. Python on Windows makes a
distinction between text and binary
files; the end-of-line characters in
text files are automatically altered
slightly when data is read or written.
This behind-the-scenes modification to
file data is fine for ASCII text
files, but it’ll corrupt binary data
like that in JPEG or EXE files. Be
very careful to use binary mode when
reading and writing such files. On
Unix, it doesn’t hurt to append a 'b'
to the mode, so you can use it
platform-independently for all binary
files.
The files you're having a problem with are almost certainly identical if several different hashing algorithms all return the same hash results on them, or there's a bug in your implementation.
As a sanity test write your own "hash" that just returns the file's contents wholly, and see if this one generates the same "hashes".
As others have stated, a single hash collision is unlikely, and multiple nigh on impossible, unless the files are identical. I would recommend generating the sums with an external utility as something of a sanity check. For example, in Ubuntu (and most/all other Linux distributions):
blair#blair-eeepc:~$ md5sum Bandwagon.mp3
b87cbc2c17cd46789cb3a3c51a350557 Bandwagon.mp3
blair#blair-eeepc:~$ sha256sum Bandwagon.mp3
b909b027271b4c3a918ec19fc85602233a4c5f418e8456648c426403526e7bc0 Bandwagon.mp3
A quick Google search shows there are similar utilities available for Windows machines. If you see the collisions with the external utilities, then the files are identical. If there are no collisions, you are doing something wrong. I doubt the Python implementation is wrong, as I get the same results when doing the hash in Python:
>>> import hashlib
>>> hashlib.md5(open('Bandwagon.mp3', 'r').read()).hexdigest()
'b87cbc2c17cd46789cb3a3c51a350557'
>>> hashlib.sha256(open('Bandwagon.mp3', 'r').read()).hexdigest()
'b909b027271b4c3a918ec19fc85602233a4c5f418e8456648c426403526e7bc0'
Like #Delan Azabani said, there is something fishy here; collisions are bound to happen, but not that often. Check if the songs are the same, and update your post please.
Also, if you feel that you don't have enough keys, you can use two (or even more) hashing algorithms at the same time: by using MD5 for example, you have 2**128, or 340282366920938463463374607431768211456 keys. By using SHA-1, you have 2**160 or 1461501637330902918203684832716283019655932542976 keys. By combining them, you have 2**128 * 2**160, or 497323236409786642155382248146820840100456150797347717440463976893159497012533375533056.
(But if you ask me, MD5 is more than enough for your needs.)

Problems with Python MD5, SHA512 (+salt) encryption

I'm trying to understand how does Linux encrypt our password on the etc/shadow file, so I've dont a new virtual 'test' user to make some test:
user: newuser
password: usrpw123
Generated salt: Ii4CGbr7
So the OS makes me the following line on the etc/shadow file, using a SHA512 encryptation system ($6$): newuser:$6$Ii4CGbr7$IOua8/oPV79Yp.BwzpxlSHjmCvRfTomZ.bhEvjZV2x5qhrvk82lZVrEtWQQej2pOWMdN7hvKwNgvCXKFQm5CB/:15069:0:99999:7:::
Now, I take the SHA512 module from python and try this:
import hashlib
m = hashlib.sha512()
m.update('Ii4CGbr7'+'usrpw123')
print m.hexdigest
This gives me the following hash as a result:
c73156daca3e31125ce457f1343201cc8a26400b2974440af2cc72687922b48b6631d21c186796ea2756ad987a996d2b261fe9ff3af4cc81e14c3029eac5df55
As you can see, it's different than the other one on the /etc/shadow file, and I dont know why if I'm using the same salt+password to generate the hash.
Can someone give me a hand and explain me more or less why this happens?
And also, why does the /etc/shadow files generates a hash with some dots (.)?
Thanks
The fields in /etc/shadow are not built or interpreted the way you think they are. You'll want to read the man page for details, but the most obvious difference is that it uses an unusual base64 encoding for both the salt and the hash.
There is an algorithm for generating the password hashes found in /etc/shadow.
See this document for an explanation:
http://www.akkadia.org/drepper/SHA-crypt.txt
There's an implementation of this in python here:
http://packages.python.org/passlib/lib/passlib.hash.sha512_crypt.html
I fell into the same trap as everything that I read lead me to believe you could retrieve the results the same way you have it written.
I was able to determine the password by using the salt and password using crypt.crypt()
import crypt
crypt.crypt(password, salt)
salt: $6$Ii4CGbr7
password: usrpw123
doesn't exactly use the hashlib library but it works.

Categories