I'm trying to write a simple Python solution to encrypt a file securely using a passphrase. I figured I would use something like bcrypt or pbkdf2 so that as time goes on, I could make my password hashes more and more difficult to brute force. I also figured I would use AES for the actual encryption, as it's a pretty safe standard. I'm not fixed on the encryption cipher, but I really like bcrypt.
I'm having quite a difficult time figuring out how to actually perform the encryption. Let's say I have a passphrase and a file I'd like to encrypt. I'd assume that I essentially need to do something like this:
from Crypto.Cipher import AES
from bcrypt import gensalt, hashpw
from hashlib import sha256
def encryptify(passphrase, file_name):
target_file = open(file_name, 'r')
# generate password, takes time
passphrase_rounds = 15
passphrase_salt = gensalt(rounds)
passphrase = sha256(hashpw(passphrase, passphrase_salt)).hexdigest()
# encrypt the file
encrypted_file = AES.new(passphrase, AES.MODE_CBC).encrypt(target_file.read())
At the final step, it fails with a ValueError, telling me that my key must be 16, 24, or 32 bytes long. What I'm not understanding is if what I'm doing is secure and why the last step is failing. I thought that SHA256 outputs 32 characters of data?
I'm particularly concerned about taking a bcrypt passphrase and throwing it through sha256, are there any potential security risks by doing this? I wouldn't imagine so, but then again, I'm not a cryptographer.
I can't comment about safety, but if you want your actual 32 bytes of SHA256, you need to call digest, not hexdigest. hexdigest returns a hexadecimal string representation (that would be 64 characters).
Related
I want to encrypt a .zip file using AES256 in Python. I am aware of the Python cryptography module, in particular the example given at:
https://cryptography.io/en/latest/fernet/
However, I have needs that are a bit different:
I want to output binary data (because I want a small encrypted file). How can I output in binary instead of armored ASCII?
I do not want to have the plaintext timestamp. Any way to remove it?
If I cannot fix those points I will use another method. Any suggestions? I was considering issuing gpg commands through subprocess.
Looking at Fernet module, seems it encrypts and authenticates the data. Actually its safer than only encrypting (see here). However, removing the timestamp, in the case of this module, doesn't make sense if you also want to authenticate.
Said that, seems you want to risky and only encrypt instead of encrypt and authenticate. You might follow the examples of the same module found at https://cryptography.io/en/latest/hazmat/primitives/symmetric-encryption/. Just make sure this is what you really want.
As you're worried about size and want to use AES, you could try AES in CTR mode, which does not need padding, avoiding extra bytes at the end.
import os
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
from cryptography.hazmat.backends import default_backend
backend = default_backend()
key = os.urandom(32)
nonce = os.urandom(16)
cipher = Cipher(algorithms.AES(key), modes.CTR(nonce), backend=backend)
encryptor = cipher.encryptor()
ct = encryptor.update(b"a secret message") + encryptor.finalize()
print(ct)
decryptor = cipher.decryptor()
print(decryptor.update(ct) + decryptor.finalize())
So, answering your questions:
(1) The update method already returns a byte array.
(2) This way there will be no plaintext data automatically appended to the ciphertext (but be aware of the security implications about not authenticating the data). However, you'll need to pass the IV anyway, what you would have to do in either case.
Given this code:
import hashlib
h = hashlib.md5()
h.update(str("foobar").encode('utf-8'))
Would the same digest be returned on any system?
If not, is there a way to ensure that it does?
Yes, a correctly implemented MD5 algorithm will always produce the same digest for the same series of bytes on any system, since that is exactly what checksum algorithms like MD5 are for.
As a side note: "foobar" is already a string, so str("foobar") is doing nothing at all in your code above. A correct version of the code would be:
import hashlib
h = hashlib.md5()
h.update("foobar".encode('utf-8'))
Also, note that MD5 is not a secure cryptographic hash function. It's fine to use it as a checksum to guard against accidental corruption, but it cannot be used to verify that data hasn't been intentionally altered.
I'm developing a web app (using gevent, but that is not significant) that has to write some confidential information in log. The obvious idea is to encrypt the confidential information using a public key that is hard-coded into my application. To read it, one would need a private key, and 2048-bit RSA seems to be safe enough. I have chosen pycrypto (tried M2Crypto as well, but found nearly no differences for my purpose) and implemented log encryption as a logging.Formatter subclass. However, I'm new to pycrypto and cryptoraphy, and I am not sure my choice of the way my data is encrypted is reasonable. Is PKCS1_OAEP module what I need? Or there are more friendly ways of encryption without dividing the data in small chunks?
So, what I did is:
import logging
import sys
from Crypto.Cipher import PKCS1_OAEP as pkcs1
from Crypto.PublicKey import RSA
PUBLIC_KEY = """ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDe2mtK03UhymB+SrIbJJUwCPhWNMl8/gA9d7jex0ciSuFfShDaqJ4wYWG4OOl\
VqKMxPrPcZ/PMSwtc021yI8TXfgewb65H/YQw4JzzGANq2+mFT8jWRDn+xUc6vcWnXIG3OPg5DvIipGQvIPNIUUP3qE7yDHnS5xdVdFrVe2bUUXmZJ9\
0xJpyqlTuRtIgfIfEQC9cggrdr1G50tXdXZjS0M1WXl5P6599oH/ykjpDFrCnh5fz9WDwUc0mNJ+11Qh+yfDp3k7AhzhRaROKLVWnfkklFaFm7LsdVX\
KPjp7dPRcTb84c2OnlIjU0ykL74Fy0K3eaPvM6TLe/K1XuD3933 pupkin#pupkin"""
PUBLIC_KEY = RSA.importKey(PUBLIC_KEY)
LOG_FORMAT = '[%(asctime)-15s - %(levelname)s: %(message)s]'
# May be more, but there is a limit.
# I suppose, the algorithm requires enough padding,
# and size of padding depends on key length.
MAX_MSG_LEN = 128
# Size of a block encoded with padding. For a 2048-bit key seems to be OK.
ENCODED_CHUNK_LEN = 256
def encode_msg(msg):
res = []
k = pkcs1.new(PUBLIC_KEY)
for i in xrange(0, len(msg), MAX_MSG_LEN):
v = k.encrypt(msg[i : i+MAX_MSG_LEN])
# There are nicer ways to make a readable line from data than using hex. However, using
# hex representation requires no extra code, so let it be hex.
res.append(v.encode('hex'))
assert len(v) == ENCODED_CHUNK_LEN
return ''.join(res)
def decode_msg(msg, private_key):
msg = msg.decode('hex')
res = []
k = pkcs1.new(private_key)
for i in xrange(0, len(msg), ENCODED_CHUNK_LEN):
res.append(k.decrypt(msg[i : i+ENCODED_CHUNK_LEN]))
return ''.join(res)
class CryptoFormatter(logging.Formatter):
NOT_SECRET = ('CRITICAL',)
def format(self, record):
"""
If needed, I may encode only certain types of messages.
"""
try:
msg = logging.Formatter.format(self, record)
if not record.levelname in self.NOT_SECRET:
msg = encode_msg(logging.Formatter.format(self, record))
return msg
except:
import traceback
return traceback.format_exc()
def decrypt_file(key_fname, data_fname):
"""
The function decrypts logs and never runs on server. In fact,
server does not have a private key at all. The only key owner
is server admin.
"""
res = ''
with open(key_fname, 'r') as kf:
pkey = RSA.importKey(kf.read())
with open(data_fname, 'r') as f:
for l in f:
l = l.strip()
if l:
try:
res += decode_msg(l, pkey) + '\n'
except Exception: # A line may be unencrypted
res += l + '\n'
return res
# Unfortunately dictConfig() does not support altering formatter class.
# Anyway, in demo code I am not going to use dictConfig().
logger = logging.getLogger()
handler = logging.StreamHandler(sys.stderr)
handler.setFormatter(CryptoFormatter(LOG_FORMAT))
logger.handlers = []
logger.addHandler(handler)
logging.warning("This is secret")
logging.critical("This is not secret")
UPDATE: Thanks to the accepted answer below, now I see:
My solution seems to be pretty valid for now (very few log entries, no performance considerations, more or less trusted storage). Concerning security, the best thing I can do right now is not forgetting to prohibit the user who runs my daemon from writing to the .py and .pyc files of the program. :-) However, if the user is compromised, he still may try to attach a debugger to my daemon process, so I should also disable login for him. Pretty obvious moments, but very important ones.
Surely there are solutions being much more scalable. A very common technique is to encrypt AES keys with slow but reliable RSA, and to encrypt data with the AES that is pretty fast. Data encryption in the case is symmetric, but retrieving the AES key requires either breaking RSA, or getting it from memory when my program is running. Stream encryption with higher-level libraries and binary log file format also are a way to go, though binary log format encrypted as a stream should be very vulnerable to log corruption, even a sudden reboot due to electricity blackout may be a problem unless I do some things at a lower level (at least log rotation on each daemon start).
I changed .encode('hex') to .encode('base64').replace('\n').replace('\r'). Fortunately, the base64 codec works fine with no line ends. It saves some space.
Using an untrusted storage may require signing records, but that seems to be another story.
Checking if the string is encrypted based on catching exceptions is ok, since, unless the log is tampered with by a malicious user, it's base64 codec who raises an exception, not RSA decryption.
You seem to encrypt data directly with RSA. This is relatively slow, and has the problem that you can only encrypt small parts of data. Distinguishing encrypted from plaintext data based on "decryption doesn't work" is also not a very clean solution, although it will probably work. You do use OAEP, which is good. You may want to use base64 instead of hex to save space.
However, crypto is easy to get wrong. For this reason, you should always use high-level crypto libraries wherever possible. Anything where you have to specify padding schemes yourself isn't "high-level". I am not sure if you will be able to create an efficient, line-based log encryption system without resorting to rather low-level libraries, though.
If you have no reason to encrypt only individual parts of the log, consider just encrypting the entire thing.
If you are really desperate for a line-based encryption, what you could do is the following: Create a random symmetric AES key from a secure randomness source, and give it a short but unique ID. Encrypt this key with RSA, and write the result to the log file in a line prefixed with a tag, e.g. "KEY", together with the ID. For each log line, generate a random IV, encrypt the message with AES256 in CBC mode using said IV (you don't have any length limits per line now!) and write the key ID, IV and the encrypted message to the log, prefixed with a tag, e.g. "ENC". After a certain time, destroy the symmetric key and repeat (generate new one, write to log). The disadvantage of this approach is that an attacker who can recover the symmetric key from memory can read the messages encrypted with said key. The advantage is that you can use higher-level building blocks and it is much, much faster (on my CPU, you can encrypt 70,000 log lines of 1 KB per second with AES-128, but only around 3,500 chunks of max. 256 bytes with RSA2048). RSA decryption is REALLY slow, by the way (around 100 chunks per second).
Note that you have no authentication, i.e. you won't notice modifications to your logs. For this reason, I assume you trust the log storage. Otherwise, see RFC 5848.
I just create a new aplication in python for registration.
I save all the fields in database and user created successfully but the password is save same as it is we filled at the time of registration.
How do I encrypt or use default functonality of python for password.
Please suggest me?
To make offline password cracking more expensive, you could use bcrypt.
If you are limited to the stdlib, there is crypt module on Unix:
hashed = crypt.crypt(plaintext)
you should hash the passwords, the following code hashes the raw-input password according to your PASSWORD_HASHERS in settings.py
from django.contrib.auth.hashers import make_password
pass = make_password(raw_pass) # hashing is done here
user.set_password(pass)
Don't implement such stuff yourself or you likely will do it wrong.
For password storage, using some reversible encoding or symetric encryption or a simple hash from hashlib or even a randomly salted hash are all major FAILURES nowadays.
If you are using django, use some strong algorithm provided by django (usually one of: bcrypt, pbkdf2, sha512_crypt).
When not using django: use passlib - after reading its documentation.
http://code.google.com/p/passlib/
Hash the password upon getting it from the user (and on registration) to encrypt it.
import hashlib
m = hashlib.sha1()
m.update("My users' password here")
m.digest()
Ref: http://docs.python.org/2/library/hashlib.html#module-hashlib
For actual encryption, you can try M2Crypto or PyCrypto. Those are probably what you are looking for; however, there are other ways to obfuscate your passwords for the average user:
(if you would like to read some more answers as to what encryption method might suit you best, check out this somewhat related SO post: Recommended Python cryptographic module?
hashlib will provide various hash algorithms (ex. "SHA1, SHA224, SHA256, SHA384, and SHA512"). A simple example:
import hashlib
enc = hashlib.md5()
enc.update("Somerandompassword")
print enc.hexdigest()
And this will print you the md5 "Somerandompassword":
c5532f9e756b4583db4c627c8aa7d303
However, for (base64) encoding, for example, try:
import base64
enc = base64.b64encode("Somerandompassword")
and decoding
dec = base64.b64decode("U29tZXJhbmRvbXBhc3N3b3Jk")
print dec
will print: Somerandompassword
I'm trying to understand how does Linux encrypt our password on the etc/shadow file, so I've dont a new virtual 'test' user to make some test:
user: newuser
password: usrpw123
Generated salt: Ii4CGbr7
So the OS makes me the following line on the etc/shadow file, using a SHA512 encryptation system ($6$): newuser:$6$Ii4CGbr7$IOua8/oPV79Yp.BwzpxlSHjmCvRfTomZ.bhEvjZV2x5qhrvk82lZVrEtWQQej2pOWMdN7hvKwNgvCXKFQm5CB/:15069:0:99999:7:::
Now, I take the SHA512 module from python and try this:
import hashlib
m = hashlib.sha512()
m.update('Ii4CGbr7'+'usrpw123')
print m.hexdigest
This gives me the following hash as a result:
c73156daca3e31125ce457f1343201cc8a26400b2974440af2cc72687922b48b6631d21c186796ea2756ad987a996d2b261fe9ff3af4cc81e14c3029eac5df55
As you can see, it's different than the other one on the /etc/shadow file, and I dont know why if I'm using the same salt+password to generate the hash.
Can someone give me a hand and explain me more or less why this happens?
And also, why does the /etc/shadow files generates a hash with some dots (.)?
Thanks
The fields in /etc/shadow are not built or interpreted the way you think they are. You'll want to read the man page for details, but the most obvious difference is that it uses an unusual base64 encoding for both the salt and the hash.
There is an algorithm for generating the password hashes found in /etc/shadow.
See this document for an explanation:
http://www.akkadia.org/drepper/SHA-crypt.txt
There's an implementation of this in python here:
http://packages.python.org/passlib/lib/passlib.hash.sha512_crypt.html
I fell into the same trap as everything that I read lead me to believe you could retrieve the results the same way you have it written.
I was able to determine the password by using the salt and password using crypt.crypt()
import crypt
crypt.crypt(password, salt)
salt: $6$Ii4CGbr7
password: usrpw123
doesn't exactly use the hashlib library but it works.