I keep important settings like the hostnames and ports of development and production servers in my version control system. But I know that it's bad practice to keep secrets (like private keys and database passwords) in a VCS repository.
But passwords--like any other setting--seem like they should be versioned. So what is the proper way to keep passwords version controlled?
I imagine it would involve keeping the secrets in their own "secrets settings" file and having that file encrypted and version controlled. But what technologies? And how to do this properly? Is there a better way entirely to go about it?
I ask the question generally, but in my specific instance I would like to store secret keys and passwords for a Django/Python site using git and github.
Also, an ideal solution would do something magical when I push/pull with git--e.g., if the encrypted passwords file changes a script is run which asks for a password and decrypts it into place.
EDIT: For clarity, I am asking about where to store production secrets.
You're exactly right to want to encrypt your sensitive settings file while still maintaining the file in version control. As you mention, the best solution would be one in which Git will transparently encrypt certain sensitive files when you push them so that locally (i.e. on any machine which has your certificate) you can use the settings file, but Git or Dropbox or whoever is storing your files under VC does not have the ability to read the information in plaintext.
Tutorial on Transparent Encryption/Decryption during Push/Pull
This gist https://gist.github.com/873637 shows a tutorial on how to use the Git's smudge/clean filter driver with openssl to transparently encrypt pushed files. You just need to do some initial setup.
Summary of How it Works
You'll basically be creating a .gitencrypt folder containing 3 bash scripts,
clean_filter_openssl
smudge_filter_openssl
diff_filter_openssl
which are used by Git for decryption, encryption, and supporting Git diff. A master passphrase and salt (fixed!) is defined inside these scripts and you MUST ensure that .gitencrypt is never actually pushed.
Example clean_filter_openssl script:
#!/bin/bash
SALT_FIXED=<your-salt> # 24 or less hex characters
PASS_FIXED=<your-passphrase>
openssl enc -base64 -aes-256-ecb -S $SALT_FIXED -k $PASS_FIXED
Similar for smudge_filter_open_ssl and diff_filter_oepnssl. See Gist.
Your repo with sensitive information should have a .gitattribute file (unencrypted and included in repo) which references the .gitencrypt directory (which contains everything Git needs to encrypt/decrypt the project transparently) and which is present on your local machine.
.gitattribute contents:
* filter=openssl diff=openssl
[merge]
renormalize = true
Finally, you will also need to add the following content to your .git/config file
[filter "openssl"]
smudge = ~/.gitencrypt/smudge_filter_openssl
clean = ~/.gitencrypt/clean_filter_openssl
[diff "openssl"]
textconv = ~/.gitencrypt/diff_filter_openssl
Now, when you push the repository containing your sensitive information to a remote repository, the files will be transparently encrypted. When you pull from a local machine which has the .gitencrypt directory (containing your passphrase), the files will be transparently decrypted.
Notes
I should note that this tutorial does not describe a way to only encrypt your sensitive settings file. This will transparently encrypt the entire repository that is pushed to the remote VC host and decrypt the entire repository so it is entirely decrypted locally. To achieve the behavior you want, you could place sensitive files for one or many projects in one sensitive_settings_repo. You could investigate how this transparent encryption technique works with Git submodules http://git-scm.com/book/en/Git-Tools-Submodules if you really need the sensitive files to be in the same repository.
The use of a fixed passphrase could theoretically lead to brute-force vulnerabilities if attackers had access to many encrypted repos/files. IMO, the probability of this is very low. As a note at the bottom of this tutorial mentions, not using a fixed passphrase will result in local versions of a repo on different machines always showing that changes have occurred with 'git status'.
Heroku pushes the use of environment variables for settings and secret keys:
The traditional approach for handling such config vars is to put them under source - in a properties file of some sort. This is an error-prone process, and is especially complicated for open source apps which often have to maintain separate (and private) branches with app-specific configurations.
A better solution is to use environment variables, and keep the keys out of the code. On a traditional host or working locally you can set environment vars in your bashrc. On Heroku, you use config vars.
With Foreman and .env files Heroku provide an enviable toolchain to export, import and synchronise environment variables.
Personally, I believe it's wrong to save secret keys alongside code. It's fundamentally inconsistent with source control, because the keys are for services extrinsic to the the code. The one boon would be that a developer can clone HEAD and run the application without any setup. However, suppose a developer checks out a historic revision of the code. Their copy will include last year's database password, so the application will fail against today's database.
With the Heroku method above, a developer can checkout last year's app, configure it with today's keys, and run it successfully against today's database.
The cleanest way in my opinion is to use environment variables. You won't have to deal with .dist files for example, and the project state on the production environment would be the same as your local machine's.
I recommend reading The Twelve-Factor App's config chapter, the others too if you're interested.
I suggest using configuration files for that and to not version them.
You can however version examples of the files.
I don't see any problem of sharing development settings. By definition it should contain no valuable data.
An option would be to put project-bound credentials into an encrypted container (TrueCrypt or Keepass) and push it.
Update as answer from my comment below:
Interesting question btw. I just found this: github.com/shadowhand/git-encrypt which looks very promising for automatic encryption
Since asking this question I have settled on a solution, which I use when developing small application with a small team of people.
git-crypt
git-crypt uses GPG to transparently encrypt files when their names match certain patterns. For intance, if you add to your .gitattributes file...
*.secret.* filter=git-crypt diff=git-crypt
...then a file like config.secret.json will always be pushed to remote repos with encryption, but remain unencrypted on your local file system.
If I want to add a new GPG key (a person) to your repo which can decrypt the protected files then run git-crypt add-gpg-user <gpg_user_key>. This creates a new commit. The new user will be able to decrypt subsequent commits.
BlackBox was recently released by StackExchange and while I have yet to use it, it seems to exactly address the problems and support the features requested in this question.
From the description on https://github.com/StackExchange/blackbox:
Safely store secrets in a VCS repo (i.e. Git or Mercurial). These
commands make it easy for you to GPG encrypt specific files in a repo
so they are "encrypted at rest" in your repository. However, the
scripts make it easy to decrypt them when you need to view or edit
them, and decrypt them for for use in production.
I ask the question generally, but in my specific instance I would like
to store secret keys and passwords for a Django/Python site using git
and github.
No, just don't, even if it's your private repo and you never intend to share it, don't.
You should create a local_settings.py put it on VCS ignore and in your settings.py do something like
from local_settings import DATABASES, SECRET_KEY
DATABASES = DATABASES
SECRET_KEY = SECRET_KEY
If your secrets settings are that versatile, I am eager to say you're doing something wrong
EDIT: I assume you want to keep track of your previous passwords versions - say, for a script that would prevent password reusing etc.
I think GnuPG is the best way to go - it's already used in one git-related project (git-annex) to encrypt repository contents stored on cloud services. GnuPG (gnu pgp) provides a very strong key-based encryption.
You keep a key on your local machine.
You add 'mypassword' to ignored files.
On pre-commit hook you encrypt the mypassword file into the mypassword.gpg file tracked by git and add it to the commit.
On post-merge hook you just decrypt mypassword.gpg into mypassword.
Now if your 'mypassword' file did not change then encrypting it will result with same ciphertext and it won't be added to the index (no redundancy). Slightest modification of mypassword results in radically different ciphertext and mypassword.gpg in staging area differs a lot from the one in repository, thus will be added to the commit. Even if the attacker gets a hold of your gpg key he still needs to bruteforce the password. If the attacker gets an access to remote repository with ciphertext he can compare a bunch of ciphertexts, but their number won't be sufficient to give him any non-negligible advantage.
Later on you can use .gitattributes to provide an on-the-fly decryption for quit git diff of your password.
Also you can have separate keys for different types of passwords etc.
Usually, i seperate password as a config file. and make them dist.
/yourapp
main.py
default.cfg.dist
And when i run main.py, put the real password in default.cfg that copied.
ps. when you work with git or hg. you can ignore *.cfg files to make .gitignore or .hgignore
Provide a way to override the config
This is the best way to manage a set of sane defaults for the config you checkin without requiring the config be complete, or contain things like hostnames and credentials. There are a few ways to override default configs.
Environment variables (as others have already mentioned) are one way of doing it.
The best way is to look for an external config file that overrides the default config values. This allows you to manage the external configs via a configuration management system like Chef, Puppet or Cfengine. Configuration management is the standard answer for the management of configs separate from the codebase so you don't have to do a release to update the config on a single host or a group of hosts.
FYI: Encrypting creds is not always a best practice, especially in a place with limited resources. It may be the case that encrypting creds will gain you no additional risk mitigation and simply add an unnecessary layer of complexity. Make sure you do the proper analysis before making a decision.
Encrypt the passwords file, using for example GPG. Add the keys on your local machine and on your server. Decrypt the file and put it outside your repo folders.
I use a passwords.conf, located in my homefolder. On every deploy this file gets updated.
No, private keys and passwords do not fall under revision control. There is no reason to burden everyone with read access to your repository with knowing sensitive service credentials used in production, when most likely not all of them should have access to those services.
Starting with Django 1.4, your Django projects now ship with a project.wsgi module that defines the application object and it's a perfect place to start enforcing the use of a project.local settings module that contains site-specific configurations.
This settings module is ignored from revision control, but it's presence is required when running your project instance as a WSGI application, typical for production environments. This is how it should look like:
import os
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "project.local")
# This application object is used by the development server
# as well as any WSGI server configured to use this file.
from django.core.wsgi import get_wsgi_application
application = get_wsgi_application()
Now you can have a local.py module who's owner and group can be configured so that only authorized personnel and the Django processes can read the file's contents.
If you need VCS for your secrets you should at least keep them in a second repository seperated from you actual code. So you can give your team members access to the source code repository and they won't see your credentials. Furthermore host this repository somewhere else (eg. on your own server with an encrypted filesystem, not on github) and for checking it out to the production system you could use something like git-submodule.
This is what I do:
Keep all secrets as env vars in $HOME/.secrets (go-r perms) that $HOME/.bashrc sources (this way if you open .bashrc in front of someone, they won't see the secrets)
Configuration files are stored in VCS as templates, such as config.properties stored as config.properties.tmpl
The template files contain a placeholder for the secret, such as:
my.password=##MY_PASSWORD##
On application deployment, script is ran that transforms the template file into the target file, replacing placeholders with values of environment variables, such as changing ##MY_PASSWORD## to the value of $MY_PASSWORD.
Another approach could be to completely avoid saving secrets in version control systems and instead use a tool like vault from hashicorp, a secret storage with key rolling and auditing, with an API and embedded encryption.
You could use EncFS if your system provides that. Thus you could keep your encrypted data as a subfolder of your repository, while providing your application a decrypted view to the data mounted aside. As the encryption is transparent, no special operations are needed on pull or push.
It would however need to mount the EncFS folders, which could be done by your application based on an password stored elsewhere outside the versioned folders (eg. environment variables).
I wrote a Flask web application for a system that our company uses. However, we have another web application, which is running on Node.JS. The "problem" is that my colleague writes everything on node, while I write everything in Python.
We want to implement both applications on one webpage - for example:
My application will run on example.com/assistant
His application will run on example.com/app1 and example.com/app2
How can we do this? Can we somehow implement the templates that I use with his templates and vice-versa?
Thank you in advance!
V
Serving different apps from the same domain
You can use haproxy for directing requests to specific service based on ACL rules.
You could use path_beg rule, to direct any request beginning with specific path to be directed to corresponding server. See example below.
/etc/haproxy/haproxy.cfg
# only relevant part of the config file
# assumes all apps are on one machine
frontend http-in
bind *:80
acl py_app1 path_beg /assistant
acl node_app1 path_beg /app1
acl node_app2 path_beg /app2
default_backend main_servers
backend py_app1
server flask_app 127.0.0.1:5000
backend node_app1
server nodejs1 127.0.0.1:4001
backend node_app2
server nodejs2 127.0.0.1:4002
backend main_servers
server other1 127.0.0.1:3000 # nginx, apache, or whatever
Sharing template code between apps
This would be harder, as you would need to both agree on some kind of format, which needs to be language and framework-agnostic, and probably logic-less.
Mustache claims to be "framework-agnostic way to render logic-free views". I used it sparringly a few years ago so this one is first that came to mind, however you should do more research on this, maybe there is some better fit.
Python implementation
JS implementation
The problem would be to actually keep the templates always in sync with apps, and not break functionality of the views. If a template changes then you would need to test all apps that use this template file. Also, you probably will block one another from updating your apps at different times, because if one of you change the template files, then you must come to a consensus, update all relevant apps, and deploy them at one time.
So, I am running a Flask web app as the frontend to a small SQLite inventory database. I am using Flask-SQLAlchemy for all of the database interaction. I have a search feature built in, so that you could (for example) find all pieces of hardware assigned to Person X. Or, find all pieces of hardware from Project ABC that have the status Available. The search feature supports optional criteria.
For security reasons, I can't post executable code here, but this is my search function:
for system in db.session.query(System).filter(and_(System.assignee_id.like("%{}%".format(assignee_id)),
System.serial.like("%{}%".format(serial)),
System.user_id.like("%{}%".format(user_id)),
System.project_id.like("%{}%".format(project_id)),
System.status_id.like("%{}%".format(status_id)))):
results.append(system)
The weird thing is, this works perfectly when I run the app in the Development config. But, in my Production config, if I try to use the 'Assignee' or 'User' criteria (both of which are strings of an email address), I get bad results. It shows me a bunch of systems, some of which have the correct assignee, but most of which don't. The only differences between the Production and Development configs are the file path to the SQLite database and the authentication mechanism to sign into the application.
Why would this return different results between the two? It works EXACTLY like it is supposed to in the Dev config.
UPDATE
After some additional debugging as suggested by the comments, it looks like the problem is in the actual data of the ProductionConfig. However, when I browse the .sqlite file in DB Browser, initially everything looks correct. Where is it getting these random assignees?
I have a existing Website deployed in Google App Engine for Python. Now I have setup the local development server in my System. But I don't know how to get the updated DataBase from live server. There is no Export option in Google's developer console.
And, I don't want to read the data for each request from Production Datastore, I want to set it up locally for once. The google manual says that it stores the local datastore in sqlite file.
Any hint would be appreciated.
First, make sure your app.yaml enables the "remote" built-in, with a stanza such as:
builtins:
- remote_api: on
This app.yaml of course must be the one deployed to your appspot.com (or whatever) "production" GAE app.
Then, it's a job for /usr/local/google_appengine/bulkloader.py or wherever you may have installed the bulkloader component. Run it with -h to get a list of the many, many options you can pass.
You may need to generate an application-specific password for this use on your google accounts page. Then, the general use will be something like:
/usr/local/google_appengine/bulkloader.py --dump --url=http://your_app.appspot.com/_ah/remote_api --filename=allkinds.sq3
You may not (yet) be able to use this "all kinds" query -- the server only generates the needed statistics for the all-kinds query "periodically", so you may get an error message including info such as:
[ERROR ] Unable to download kind stats for all-kinds download.
[ERROR ] Kind stats are generated periodically by the appserver
[ERROR ] Kind stats are not available on dev_appserver.
If that's the case, then you can still get things "one kind at a time" by adding the option --kind=EntityKind and running the bulkloader repeatedly (with separate sqlite3 result files) for each kind of entity.
Once you've dumped (kind by kind if you have to, all at once if you can) the production datastore, you can use the bulkloader again, this time with --restore and addressing your localhost dev_appserver instance, to rebuild the latter's datastore.
It should be possible to explicitly list kinds in the --kind flag (by separating them with commas and putting them all in parentheses) but unfortunately I think I've found a bug stopping that from working -- I'll try to get it fixed but don't hold your breath. In any case, this feature is not documented (I just found it by studying the open-source release of bulkloader.py) so it may be best not to rely on it!-)
More info about the then-new bulkloader can be found in a blog post by Nick Johnson at http://blog.notdot.net/2010/04/Using-the-new-bulkloader (though it doesn't cover newer functionalities such as the sqlite3 format of results in the "zero configuration" approach I outlined above). There's also a demo, with plenty of links, at http://bulkloadersample.appspot.com/ (also a bit outdated, alas).
Check out the remote API. This will tunnel your database calls over HTTP to the production database.
I want to use Celery to implement a task queue to perform long(ish) running tasks like interacting with external APIs (e.g. Twilio for SMS sending). However, I use different API credentials in production and in development.
I can't figure out how to statically configure Celery (i.e. from the commandline) to pass in the appropriate API credentials. Relatedly, how does my application code (which launches Celery tasks) specify which Celery queue to talk to if there are both development and production queues?
Thanks for any help you can offer.
Avi
EDIT: additional bonus for a working example of how to use the --config option of celery.
The way that I do it is using an environment variable. As a simple example...
# By convention, my configuration files are in a "configs/XXX.ini" file, with
# XXX being the configuration name (e.g., "staging.ini")
config_filename = os.path.join('configs', os.environ['CELERY_CONFIG'] + '.ini')
configuration = read_config_file(config_filename)
# Now you can create the Celery object using your configuration...
celery = Celery('mymodule', broker=configuration['CELERY_BROKER_URL'])
#celery.task
def add_stuff(x, y):
....
You end up running from the command line like so...
export CELERY_CONFIG=staging
celery -A mymodule worker
This question has an example of doing something like this, but they say "how can I do this in a way that is not so ugly?" As far as I'm concerned, this is quite acceptable, and not "ugly" at all.
According to the twelve factor app, you should use environment variables instead of command line parameters.
This is specially true if you are using sensitive information like access credentials because they are visible in the ps output. The other idea (storing credentials in config files) is far from ideal because you should avoid storing sensitive information in the VCS.
That is why many container services and PaaS providers favor this approach: easier instrumentation and automated deployments.
You may want to take a look at Python Deployment Anti-patterns.