I have got a python script which is creating an ODBC connection. The ODBC connection is generated with a connection string. In this connection string I have to include the username and password for this connection.
Is there an easy way to obscure this password in the file (just that nobody can read the password when I'm editing the file) ?
Base64 encoding is in the standard library and will do to stop shoulder surfers:
>>> import base64
>>> print(base64.b64encode("password".encode("utf-8")))
cGFzc3dvcmQ=
>>> print(base64.b64decode("cGFzc3dvcmQ=").decode("utf-8"))
password
Here is a simple method:
Create a python module - let's call it peekaboo.py.
In peekaboo.py, include both the password and any code needing that password
Create a compiled version - peekaboo.pyc - by importing this module (via python commandline, etc...).
Now, delete peekaboo.py.
You can now happily import peekaboo relying only on peekaboo.pyc. Since peekaboo.pyc is byte compiled it is not readable to the casual user.
This should be a bit more secure than base64 decoding - although it is vulnerable to a py_to_pyc decompiler.
Douglas F Shearer's is the generally approved solution in Unix when you need to specify a password for a remote login.
You add a --password-from-file option to specify the path and read plaintext from a file.
The file can then be in the user's own area protected by the operating system.
It also allows different users to automatically pick up their own own file.
For passwords that the user of the script isn't allowed to know - you can run the script with elavated permission and have the password file owned by that root/admin user.
If you are working on a Unix system, take advantage of the netrc module in the standard Python library. It reads passwords from a separate text file (.netrc), which has the format decribed here.
Here is a small usage example:
import netrc
# Define which host in the .netrc file to use
HOST = 'mailcluster.loopia.se'
# Read from the .netrc file in your home directory
secrets = netrc.netrc()
username, account, password = secrets.authenticators( HOST )
print username, password
How about importing the username and password from a file external to the script? That way even if someone got hold of the script, they wouldn't automatically get the password.
The best solution, assuming the username and password can't be given at runtime by the user, is probably a separate source file containing only variable initialization for the username and password that is imported into your main code. This file would only need editing when the credentials change. Otherwise, if you're only worried about shoulder surfers with average memories, base 64 encoding is probably the easiest solution. ROT13 is just too easy to decode manually, isn't case sensitive and retains too much meaning in it's encrypted state. Encode your password and user id outside the python script. Have he script decode at runtime for use.
Giving scripts credentials for automated tasks is always a risky proposal. Your script should have its own credentials and the account it uses should have no access other than exactly what is necessary. At least the password should be long and rather random.
base64 is the way to go for your simple needs. There is no need to import anything:
>>> 'your string'.encode('base64')
'eW91ciBzdHJpbmc=\n'
>>> _.decode('base64')
'your string'
A way that I have done this is as follows:
At the python shell:
>>> from cryptography.fernet import Fernet
>>> key = Fernet.generate_key()
>>> print(key)
b'B8XBLJDiroM3N2nCBuUlzPL06AmfV4XkPJ5OKsPZbC4='
>>> cipher = Fernet(key)
>>> password = "thepassword".encode('utf-8')
>>> token = cipher.encrypt(password)
>>> print(token)
b'gAAAAABe_TUP82q1zMR9SZw1LpawRLHjgNLdUOmW31RApwASzeo4qWSZ52ZBYpSrb1kUeXNFoX0tyhe7kWuudNs2Iy7vUwaY7Q=='
Then, create a module with the following code:
from cryptography.fernet import Fernet
# you store the key and the token
key = b'B8XBLJDiroM3N2nCBuUlzPL06AmfV4XkPJ5OKsPZbC4='
token = b'gAAAAABe_TUP82q1zMR9SZw1LpawRLHjgNLdUOmW31RApwASzeo4qWSZ52ZBYpSrb1kUeXNFoX0tyhe7kWuudNs2Iy7vUwaY7Q=='
# create a cipher and decrypt when you need your password
cipher = Fernet(key)
mypassword = cipher.decrypt(token).decode('utf-8')
Once you've done this, you can either import mypassword directly or you can import the token and cipher to decrypt as needed.
Obviously, there are some shortcomings to this approach. If someone has both the token and the key (as they would if they have the script), they can decrypt easily. However it does obfuscate, and if you compile the code (with something like Nuitka) at least your password won't appear as plain text in a hex editor.
for python3 obfuscation using base64 is done differently:
import base64
base64.b64encode(b'PasswordStringAsStreamOfBytes')
which results in
b'UGFzc3dvcmRTdHJpbmdBc1N0cmVhbU9mQnl0ZXM='
note the informal string representation, the actual string is in quotes
and decoding back to the original string
base64.b64decode(b'UGFzc3dvcmRTdHJpbmdBc1N0cmVhbU9mQnl0ZXM=')
b'PasswordStringAsStreamOfBytes'
to use this result where string objects are required the bytes object can be translated
repr = base64.b64decode(b'UGFzc3dvcmRTdHJpbmdBc1N0cmVhbU9mQnl0ZXM=')
secret = repr.decode('utf-8')
print(secret)
for more information on how python3 handles bytes (and strings accordingly) please see the official documentation.
This is a pretty common problem. Typically the best you can do is to either
A) create some kind of ceasar cipher function to encode/decode (just not rot13)
or
B) the preferred method is to use an encryption key, within reach of your program, encode/decode the password. In which you can use file protection to protect access the key.
Along those lines if your app runs as a service/daemon (like a webserver) you can put your key into a password protected keystore with the password input as part of the service startup. It'll take an admin to restart your app, but you will have really good pretection for your configuration passwords.
Your operating system probably provides facilities for encrypting data securely. For instance, on Windows there is DPAPI (data protection API). Why not ask the user for their credentials the first time you run then squirrel them away encrypted for subsequent runs?
Here is my snippet for such thing. You basically import or copy the function to your code. getCredentials will create the encrypted file if it does not exist and return a dictionaty, and updateCredential will update.
import os
def getCredentials():
import base64
splitter='<PC+,DFS/-SHQ.R'
directory='C:\\PCT'
if not os.path.exists(directory):
os.makedirs(directory)
try:
with open(directory+'\\Credentials.txt', 'r') as file:
cred = file.read()
file.close()
except:
print('I could not file the credentials file. \nSo I dont keep asking you for your email and password everytime you run me, I will be saving an encrypted file at {}.\n'.format(directory))
lanid = base64.b64encode(bytes(input(' LanID: '), encoding='utf-8')).decode('utf-8')
email = base64.b64encode(bytes(input(' eMail: '), encoding='utf-8')).decode('utf-8')
password = base64.b64encode(bytes(input(' PassW: '), encoding='utf-8')).decode('utf-8')
cred = lanid+splitter+email+splitter+password
with open(directory+'\\Credentials.txt','w+') as file:
file.write(cred)
file.close()
return {'lanid':base64.b64decode(bytes(cred.split(splitter)[0], encoding='utf-8')).decode('utf-8'),
'email':base64.b64decode(bytes(cred.split(splitter)[1], encoding='utf-8')).decode('utf-8'),
'password':base64.b64decode(bytes(cred.split(splitter)[2], encoding='utf-8')).decode('utf-8')}
def updateCredentials():
import base64
splitter='<PC+,DFS/-SHQ.R'
directory='C:\\PCT'
if not os.path.exists(directory):
os.makedirs(directory)
print('I will be saving an encrypted file at {}.\n'.format(directory))
lanid = base64.b64encode(bytes(input(' LanID: '), encoding='utf-8')).decode('utf-8')
email = base64.b64encode(bytes(input(' eMail: '), encoding='utf-8')).decode('utf-8')
password = base64.b64encode(bytes(input(' PassW: '), encoding='utf-8')).decode('utf-8')
cred = lanid+splitter+email+splitter+password
with open(directory+'\\Credentials.txt','w+') as file:
file.write(cred)
file.close()
cred = getCredentials()
updateCredentials()
Place the configuration information in a encrypted config file. Query this info in your code using an key. Place this key in a separate file per environment, and don't store it with your code.
More homegrown appraoch rather than converting authentication / passwords / username to encrytpted details. FTPLIB is just the example.
"pass.csv" is the csv file name
Save password in CSV like below :
user_name
user_password
(With no column heading)
Reading the CSV and saving it to a list.
Using List elelments as authetntication details.
Full code.
import os
import ftplib
import csv
cred_detail = []
os.chdir("Folder where the csv file is stored")
for row in csv.reader(open("pass.csv","rb")):
cred_detail.append(row)
ftp = ftplib.FTP('server_name',cred_detail[0][0],cred_detail[1][0])
Do you know pit?
https://pypi.python.org/pypi/pit (py2 only (version 0.3))
https://github.com/yoshiori/pit (it will work on py3 (current version 0.4))
test.py
from pit import Pit
config = Pit.get('section-name', {'require': {
'username': 'DEFAULT STRING',
'password': 'DEFAULT STRING',
}})
print(config)
Run:
$ python test.py
{'password': 'my-password', 'username': 'my-name'}
~/.pit/default.yml:
section-name:
password: my-password
username: my-name
If running on Windows, you could consider using win32crypt library. It allows storage and retrieval of protected data (keys, passwords) by the user that is running the script, thus passwords are never stored in clear text or obfuscated format in your code. I am not sure if there is an equivalent implementation for other platforms, so with the strict use of win32crypt your code is not portable.
I believe the module can be obtained here: http://timgolden.me.uk/pywin32-docs/win32crypt.html
You could also consider the possibility of storing the password outside the script, and supplying it at runtime
e.g. fred.py
import os
username = 'fred'
password = os.environ.get('PASSWORD', '')
print(username, password)
which can be run like
$ PASSWORD=password123 python fred.py
fred password123
Extra layers of "security through obscurity" can be achieved by using base64 (as suggested above), using less obvious names in the code and further distancing the actual password from the code.
If the code is in a repository, it is often useful to store secrets outside it, so one could add this to ~/.bashrc (or to a vault, or a launch script, ...)
export SURNAME=cGFzc3dvcmQxMjM=
and change fred.py to
import os
import base64
name = 'fred'
surname = base64.b64decode(os.environ.get('SURNAME', '')).decode('utf-8')
print(name, surname)
then re-login and
$ python fred.py
fred password123
Why not have a simple xor?
Advantages:
looks like binary data
noone can read it without knowing the key (even if it's a single char)
I get to the point where I recognize simple b64 strings for common words and rot13 as well. Xor would make it much harder.
There are several ROT13 utilities written in Python on the 'Net -- just google for them. ROT13 encode the string offline, copy it into the source, decode at point of transmission.But this is really weak protection...
This doesn't precisely answer your question, but it's related. I was going to add as a comment but wasn't allowed.
I've been dealing with this same issue, and we have decided to expose the script to the users using Jenkins. This allows us to store the db credentials in a separate file that is encrypted and secured on a server and not accessible to non-admins.
It also allows us a bit of a shortcut to creating a UI, and throttling execution.
import base64
print(base64.b64encode("password".encode("utf-8")))
print(base64.b64decode(b'cGFzc3dvcmQ='.decode("utf-8")))
I want to encrypt a .zip file using AES256 in Python. I am aware of the Python cryptography module, in particular the example given at:
https://cryptography.io/en/latest/fernet/
However, I have needs that are a bit different:
I want to output binary data (because I want a small encrypted file). How can I output in binary instead of armored ASCII?
I do not want to have the plaintext timestamp. Any way to remove it?
If I cannot fix those points I will use another method. Any suggestions? I was considering issuing gpg commands through subprocess.
Looking at Fernet module, seems it encrypts and authenticates the data. Actually its safer than only encrypting (see here). However, removing the timestamp, in the case of this module, doesn't make sense if you also want to authenticate.
Said that, seems you want to risky and only encrypt instead of encrypt and authenticate. You might follow the examples of the same module found at https://cryptography.io/en/latest/hazmat/primitives/symmetric-encryption/. Just make sure this is what you really want.
As you're worried about size and want to use AES, you could try AES in CTR mode, which does not need padding, avoiding extra bytes at the end.
import os
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
from cryptography.hazmat.backends import default_backend
backend = default_backend()
key = os.urandom(32)
nonce = os.urandom(16)
cipher = Cipher(algorithms.AES(key), modes.CTR(nonce), backend=backend)
encryptor = cipher.encryptor()
ct = encryptor.update(b"a secret message") + encryptor.finalize()
print(ct)
decryptor = cipher.decryptor()
print(decryptor.update(ct) + decryptor.finalize())
So, answering your questions:
(1) The update method already returns a byte array.
(2) This way there will be no plaintext data automatically appended to the ciphertext (but be aware of the security implications about not authenticating the data). However, you'll need to pass the IV anyway, what you would have to do in either case.
My server encrypts files using pycrypto with AES in CTR mode. My counter is a simple counter like this:
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x03
I wanna decrypt the cipher text with c++'s cryptopp library in my clients. How should I do so?
Python code:
encryptor = AES.new(
CRYPTOGRAPHY_KEY,
AES.MODE_CTR,
counter=Counter.new(128),
)
cipher = encryptor.encrypt(plain_text)
C++ code so far:
byte ctr[] = "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01"
mDecryptor = new CryptoPP::CTR_Mode<CryptoPP::AES>::Decryption(key, 32, ctr);
std::string plain;
CryptoPP::StringSource(std::string(data, len), true, new CryptoPP::StreamTransformationFilter(*mDecryptor, new CryptoPP::StringSink(plain)));
but after running this plain is garbage.
Update:
Sample encrypted data you can try to decrypt with crypto++ so that you can help me even if you don't know python and you're just experienced with crypto++:
Try to decrypt this base64 encoded text:
2t0lLuSBY7NkfK5I4kML0qjcZl3xHcEQBPbDo4TbvQaXuUT8W7lNbRCl8hfSGJA00wgUXhAjQApcuTCZckb9e6EVOwsa+eLY78jo2CqYWzhGez9zn0D2LMKNmZQi88WuTFVw9r1GSKIHstoDWvn54zISmr/1JgjC++mv2yRvatcvs8GhcsZVZT8dueaNK6tXLd1fQumhXCjpMyFjOlPWVTBPjlnsC5Uh98V/YiIa898SF4dwfjtDrG/fQZYmWUzJ8k2AslYLKGs=
with this key:
12341234123412341234123412341234
with counter function described in the beginning of this post using crypto++. If you succeed post the decrypted text (which contains only numbers) and your solution please.
Update2:
I'm not providing an IV in python code, the python module ignores IV. I the IV thing is what causing the problem.
As I read their source codes I can say PyCrypto and Crypto++ Both are perfect libraries for cryptography for Python and C++. The problem was that I was prefixing the encrypted data with some meta information about file and I totally forgot about that, after handling these meta data in client Crypto++ decrypted my files.
As I didn't find this documented explicitly anywhere (not even in Wikipedia) I write it here:
Any combination of Nonce, IV and Counter like concatenation, xor, or likes will work for CTR mode, but the standard that most libraries implement is to concatenate these values in order. So the value that is used in block cipher algorithm is usually: Nonce + IV + Counter. And counter usually starts from 1 (not 0).
I'm trying to encrypt a file using OpenPGP in python via the pycrypto application. I've been following the sample provided in their code here: https://github.com/dlitz/pycrypto/blob/master/lib/Crypto/Cipher/CAST.py
So I'm using mode.openPGP, but I can't seem to encrypt anything using a public key. My public key is well over the 16byte limit they specify (and any generation I've seen is over this limit as well). Is there a different value I'm supposed to use here, like the fingerprint ID?
I'm trying to read the contents of a file, encrypt it with a key, then print it into a new file to be sent (both will be deleted later on).
My code is as follows:
iv = CryptoRandom.new().read(CAST.block_size)
cipher = CAST.new(public_key, CAST.MODE_OPENPGP, iv)
file = open(filename)
contents = ''.join(file.readlines())
encrypted_contents = cipher.encrypt(contents)
encrypted_filename = filename.replace('/tmp/', '/tmp/encrypted')
encrypted_filename = encrypted_filename.replace('.csv', '.asc')
encrypted_file = open(encrypted_filename, 'w')
encrypted_file.write(encrypted_contents)
return encrypted_filename
I think you may be misunderstanding the algorithm you're using here. CAST is a symmetric-key algorithm, but whilst this implementation has an "OpenPGP mode", that doesn't mean that you simply pass your public key to it.
You should be generating a unique 16 byte key and passing that to CAST.new(). You would then generally encrypt that randomly-generated key using the public-key, and store/transmit the cipher text, and encrypted random-key together. The decryption process would decrypt the random-key using the private-key, then use the decrypted random-key to decrypt the cipher text.
I'm developing a web app (using gevent, but that is not significant) that has to write some confidential information in log. The obvious idea is to encrypt the confidential information using a public key that is hard-coded into my application. To read it, one would need a private key, and 2048-bit RSA seems to be safe enough. I have chosen pycrypto (tried M2Crypto as well, but found nearly no differences for my purpose) and implemented log encryption as a logging.Formatter subclass. However, I'm new to pycrypto and cryptoraphy, and I am not sure my choice of the way my data is encrypted is reasonable. Is PKCS1_OAEP module what I need? Or there are more friendly ways of encryption without dividing the data in small chunks?
So, what I did is:
import logging
import sys
from Crypto.Cipher import PKCS1_OAEP as pkcs1
from Crypto.PublicKey import RSA
PUBLIC_KEY = """ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDe2mtK03UhymB+SrIbJJUwCPhWNMl8/gA9d7jex0ciSuFfShDaqJ4wYWG4OOl\
VqKMxPrPcZ/PMSwtc021yI8TXfgewb65H/YQw4JzzGANq2+mFT8jWRDn+xUc6vcWnXIG3OPg5DvIipGQvIPNIUUP3qE7yDHnS5xdVdFrVe2bUUXmZJ9\
0xJpyqlTuRtIgfIfEQC9cggrdr1G50tXdXZjS0M1WXl5P6599oH/ykjpDFrCnh5fz9WDwUc0mNJ+11Qh+yfDp3k7AhzhRaROKLVWnfkklFaFm7LsdVX\
KPjp7dPRcTb84c2OnlIjU0ykL74Fy0K3eaPvM6TLe/K1XuD3933 pupkin#pupkin"""
PUBLIC_KEY = RSA.importKey(PUBLIC_KEY)
LOG_FORMAT = '[%(asctime)-15s - %(levelname)s: %(message)s]'
# May be more, but there is a limit.
# I suppose, the algorithm requires enough padding,
# and size of padding depends on key length.
MAX_MSG_LEN = 128
# Size of a block encoded with padding. For a 2048-bit key seems to be OK.
ENCODED_CHUNK_LEN = 256
def encode_msg(msg):
res = []
k = pkcs1.new(PUBLIC_KEY)
for i in xrange(0, len(msg), MAX_MSG_LEN):
v = k.encrypt(msg[i : i+MAX_MSG_LEN])
# There are nicer ways to make a readable line from data than using hex. However, using
# hex representation requires no extra code, so let it be hex.
res.append(v.encode('hex'))
assert len(v) == ENCODED_CHUNK_LEN
return ''.join(res)
def decode_msg(msg, private_key):
msg = msg.decode('hex')
res = []
k = pkcs1.new(private_key)
for i in xrange(0, len(msg), ENCODED_CHUNK_LEN):
res.append(k.decrypt(msg[i : i+ENCODED_CHUNK_LEN]))
return ''.join(res)
class CryptoFormatter(logging.Formatter):
NOT_SECRET = ('CRITICAL',)
def format(self, record):
"""
If needed, I may encode only certain types of messages.
"""
try:
msg = logging.Formatter.format(self, record)
if not record.levelname in self.NOT_SECRET:
msg = encode_msg(logging.Formatter.format(self, record))
return msg
except:
import traceback
return traceback.format_exc()
def decrypt_file(key_fname, data_fname):
"""
The function decrypts logs and never runs on server. In fact,
server does not have a private key at all. The only key owner
is server admin.
"""
res = ''
with open(key_fname, 'r') as kf:
pkey = RSA.importKey(kf.read())
with open(data_fname, 'r') as f:
for l in f:
l = l.strip()
if l:
try:
res += decode_msg(l, pkey) + '\n'
except Exception: # A line may be unencrypted
res += l + '\n'
return res
# Unfortunately dictConfig() does not support altering formatter class.
# Anyway, in demo code I am not going to use dictConfig().
logger = logging.getLogger()
handler = logging.StreamHandler(sys.stderr)
handler.setFormatter(CryptoFormatter(LOG_FORMAT))
logger.handlers = []
logger.addHandler(handler)
logging.warning("This is secret")
logging.critical("This is not secret")
UPDATE: Thanks to the accepted answer below, now I see:
My solution seems to be pretty valid for now (very few log entries, no performance considerations, more or less trusted storage). Concerning security, the best thing I can do right now is not forgetting to prohibit the user who runs my daemon from writing to the .py and .pyc files of the program. :-) However, if the user is compromised, he still may try to attach a debugger to my daemon process, so I should also disable login for him. Pretty obvious moments, but very important ones.
Surely there are solutions being much more scalable. A very common technique is to encrypt AES keys with slow but reliable RSA, and to encrypt data with the AES that is pretty fast. Data encryption in the case is symmetric, but retrieving the AES key requires either breaking RSA, or getting it from memory when my program is running. Stream encryption with higher-level libraries and binary log file format also are a way to go, though binary log format encrypted as a stream should be very vulnerable to log corruption, even a sudden reboot due to electricity blackout may be a problem unless I do some things at a lower level (at least log rotation on each daemon start).
I changed .encode('hex') to .encode('base64').replace('\n').replace('\r'). Fortunately, the base64 codec works fine with no line ends. It saves some space.
Using an untrusted storage may require signing records, but that seems to be another story.
Checking if the string is encrypted based on catching exceptions is ok, since, unless the log is tampered with by a malicious user, it's base64 codec who raises an exception, not RSA decryption.
You seem to encrypt data directly with RSA. This is relatively slow, and has the problem that you can only encrypt small parts of data. Distinguishing encrypted from plaintext data based on "decryption doesn't work" is also not a very clean solution, although it will probably work. You do use OAEP, which is good. You may want to use base64 instead of hex to save space.
However, crypto is easy to get wrong. For this reason, you should always use high-level crypto libraries wherever possible. Anything where you have to specify padding schemes yourself isn't "high-level". I am not sure if you will be able to create an efficient, line-based log encryption system without resorting to rather low-level libraries, though.
If you have no reason to encrypt only individual parts of the log, consider just encrypting the entire thing.
If you are really desperate for a line-based encryption, what you could do is the following: Create a random symmetric AES key from a secure randomness source, and give it a short but unique ID. Encrypt this key with RSA, and write the result to the log file in a line prefixed with a tag, e.g. "KEY", together with the ID. For each log line, generate a random IV, encrypt the message with AES256 in CBC mode using said IV (you don't have any length limits per line now!) and write the key ID, IV and the encrypted message to the log, prefixed with a tag, e.g. "ENC". After a certain time, destroy the symmetric key and repeat (generate new one, write to log). The disadvantage of this approach is that an attacker who can recover the symmetric key from memory can read the messages encrypted with said key. The advantage is that you can use higher-level building blocks and it is much, much faster (on my CPU, you can encrypt 70,000 log lines of 1 KB per second with AES-128, but only around 3,500 chunks of max. 256 bytes with RSA2048). RSA decryption is REALLY slow, by the way (around 100 chunks per second).
Note that you have no authentication, i.e. you won't notice modifications to your logs. For this reason, I assume you trust the log storage. Otherwise, see RFC 5848.