Heroku: Python dependencies in private repos without storing my password

Heroku: Python dependencies in private repos without storing my password - python

The Problem
My problem is exactly like How do I install in-house requirements for Python Heroku projects? and How to customize pip's requirements.txt in Heroku on deployment?. Namely, I have a private repo from which I need a Python dependency installed into my Heroku app. The canonical answer, given by Heroku's own Kenneth Reitz, is to put something like
-e git+https://username:password#github.com/kennethreitz/requests.git#v0.10.0#egg=requests
in your requirements.txt file.
My security needs prevent my storing my password in a repo. (I also do not want to put the dependency inside my app's repo; they're separate pieces of software and need to be in separate repos.) The only place I can give my password (or, preferably, a GitHub OAuth token or deployment key) to Heroku, is in an environment variable like
heroku config:add GITHUB_OAUTH_TOKEN=12312312312313
Attempted Solutions
I could use a custom .profile in my app's repo, but then I'd be downloading and installing my dependency each time a process (web, worker, etc) restarts.
This leaves using a custom buildpack and the Heroku Labs addon that exposes my heroku config environment before the buildpack compiles. I tried building one on top of Buildpack Multi. The idea is Buildpack Multi is the primary buildpack, and using the .buildpacks file in my app's repo, it first downloads the normal Heroku Python buildpack, then my custom one.
The trouble is even after Buildpack Multi successfully runs the Python buildpack, the Python binary and Pip package are not visible to my buildpack once Buildpack Multi runs. So the custom buildpack just fails outright. (In my tests, the GITHUB_OAUTH_TOKEN environment variable was correctly exposed to the buildpacks.)
The only other thing I can think to try is to make my own fork of the Python buildpack that installs my dependency when it installs everything from requirements.txt, or even rewrites requirements.txt directly. Both of these seem like really heavy solutions to what I would think is a very common problem.
Update: Current Workaround
My custom buildpack (linked above) now downloads and saves my closed-source dependency ("foo") into the vendor directory that the geos buildpack uses. I committed into my app the dependencies that foo itself has into my app's requirements.txt. Thus Pip installs foo's dependencies through my app's requirements.txt and the buildpack adds the vendored copy of foo to my app's environment's PYTHONPATH (so foo's setup.py install never runs).
The biggest problem with this approach is coupling my (admittedly badly written) buildpack with my app. The second problem is that my app's requirements.txt should just list foo as a dependency and leave foo's dependencies to foo to determine. Lastly, there isn't a good way to give myself in six months from now when I forget how I did all this an error message if I forget to set my GITHUB_OAUTH_TOKEN environment variable (or, producing even less useful error feedback would be if the token expires and the environment variable still exists but is no longer valid).
Cry for Help
What (likely obvious) thing am I missing? How have you solved this problem in your apps? Any suggestions on getting my build pack to work, or hopefully an even simpler solution?

I created a buildpack to solve this problem using a custom ssh key stored as an environment variable. As the buildpack is technology agnostic, it can be used to download dependencies using any tool like composer for php, bundler for ruby, npm for javascript, etc: https://github.com/simon0191/custom-ssh-key-buildpack
Add the buildpack to your app:
$ heroku buildpacks:add --index 1 https://github.com/simon0191/custom-ssh-key-buildpack
Generate a new SSH key (lets say you named it deploy_key)
Add the public key to your private repository account. For example:
Github: https://help.github.com/articles/adding-a-new-ssh-key-to-your-github-account/
Bitbucket: https://confluence.atlassian.com/bitbucket/add-an-ssh-key-to-an-account-302811853.html
Encode the private key as a base64 string and add it as the CUSTOM_SSH_KEY environment variable of the heroku app.
Make a comma separated list of the hosts for which the ssh key should be used and add it as the CUSTOM_SSH_KEY_HOSTS environment variable of the heroku app.
# MacOS
$ heroku config:set CUSTOM_SSH_KEY=$(base64 --input ~/.ssh/deploy_key) CUSTOM_SSH_KEY_HOSTS=bitbucket.org,github.com
# Ubuntu
$ heroku config:set CUSTOM_SSH_KEY=$(base64 ~/.ssh/deploy_key) CUSTOM_SSH_KEY_HOSTS=bitbucket.org,github.com
Deploy your app and enjoy :)

I faced the same problem. Like you, I am amazed how difficult it is to find good documentation on how to install private dependency (whatever the language and the service used).
Because this is not a main concern of service providers, I now try a systematic approach relying as few as possible on idiosyncratic features. I try to find the easier solution for each of these steps:
Pass the credentials to the build environment using a secure channel. For python, use an environment variable containing a SSH key as a base64 string. For js, same with the npm token.
Configure the build process to use these credentials. In the best case it involves configuring ssh to use a deploy key. Otherwise it can as basic as cloning the dependency for later use. For your specific case with python and heroku, you can use the hook 'pre_compile'.
I detailed the process for my future self here: https://gist.github.com/michelbl/a6163522d95540cf0c8b6667bd35d5f5
I need to give access to a private dependency. It can happen for continuous integration or deployment.
Here we use python and github, using the services CircleCI and Heroku. However, the principles applies everywhere.
What is a deploy key?
See https://developer.github.com/v3/guides/managing-deploy-keys/
There are 4 ways of granting access to a private dependency, but deploy keys are a good compromise in term of security and ease of use for projects that do not require too many dependencies (in that case, prefer a machine user). In any case, do not use username/password of a developer account or oauth token as they do not provide privilege limitation.
Create a deploy key:
ssh-keygen -t rsa -b 4096 -C "myself#my_company.com"
Give the public part to gihub.
Give the private part to the service needing access. See below.
General strategy
Whatever the service or the technology that I use, the goal is to access the git repo using ssh, using the deploy key.
Obviously, I do not want to put the deploy key in the repo. But most services (CI, deployment) provide a way to set protected environment variables that can be used at build time. The key can be encoded using base64:
cat deploy-key | base64
cat deploy-key.pub | base64
Most services also provide a way to tailor the build procedure. This is needed to configure ssh to use the deploy key.
CircleCI
Set the deploy key using env variables, encode with base64.
In config.yml, add a step:
echo $DEPLOY_KEY_PRIVATE | base64 --decode > ~/.ssh/deploy-key
chmod 400 ~/.ssh/deploy-key
echo $DEPLOY_KEY_PUBLIC | base64 --decode > ~/.ssh/deploy-key.pub
ssh-add ~/.ssh/deploy-key
# Run this to check which private key is used. If the checkout key is used,
# github replies "Hi my_org/my_package". If the deploy key is used as wished,
# github replies "Hi my_org/my_dependency".
#ssh -i ~/.ssh/deploy-key -T git#github.com || true
# Now pip connects to git+ssh using the deploy key
export GIT_SSH_COMMAND="ssh -i ~/.ssh/deploy-key"
pip install -r requirements.txt
requirements.txt can be something like:
# The purpose of this file is to install the private dependency *before*
# setup.py is run.
# Be sure ssh is configured to use a ssh key with read permission to the repo.
git+ssh://git#github.com/my_org/my_dependency#1.0.10
# Run setup.py. The private dependency is already installed with the good
# version so pip doesn't try to fetch it from PyPI.
--editable .
and setup.py does not care about the dependency beeing private:
from distutils.core import setup
setup(
name='my_package',
version='1.0',
packages=[
'my_package',
],
install_requires=[
# Beware, the following package is a private dependency.
# Python provides several way to install private dependencies, none
# are really satisfactory.
# 1. Use dependency_links / --process-dependency-links. Good luck with
# that!
# 2. Maintain a private package repository. Good luck with that!
# 3. Install the private dependency separately before setup.py is run.
# This is now the prefered way. Be sure that ssh is properly
# configured to use a ssh key with read permission to the github repo
# of the private dependency, then run:
# `pip install -r requirements.txt`
'my_dependency==1.0.10',
... # my normal dependencies
'unidecode==1.0.22',
'uwsgi==2.0.15',
'nose==1.3.7', # tests
'flake8==3.5.0', # style
],
)
Heroku
For python, there is no need to write a custom buildpack. First, set the deploy key using env variables, encode with base64.
Then add the hook bin/pre_compile:
# This script configures ssh on Heroku to use the deploy key.
# This is needed to install private dependencies.
#
# Note that this does not work with Heroku review apps. Indeed review apps can
# inherits env variables from their parents, but they access their values after
# the build. You would need a way to pass the ssh key to this script another
# way.
#
# See also
# * https://stackoverflow.com/questions/21297755/heroku-python-dependencies-in-private-repos-without-storing-my-password#
# * https://github.com/bjeanes/ssh-private-key-buildpack
# Ensure we have an ssh folder
if [ ! -d ~/.ssh ]; then
mkdir -p ~/.ssh
chmod 700 ~/.ssh
fi
# Create the key files
cat $ENV_DIR/DEPLOY_KEY | base64 --decode > ~/.ssh/deploy-key
chmod 400 ~/.ssh/deploy-key
cat $ENV_DIR/DEPLOY_KEY | base64 --decode > ~/.ssh/deploy-key.pub
#ssh-add ~/.ssh/deploy-key
# If you want to disable host verification, you could use that.
#ssh -oStrictHostKeyChecking=no -T git#github.com 2>&1
# Run that if you want to check that ssh uses the correct key.
#ssh -i ~/.ssh/deploy-key -T git#github.com || true
# Configure ssh to use the correct deploy key when connecting to github.
# Disables host verification.
echo -e "Host github.com\n"\
" IdentityFile ~/.ssh/deploy-key\n"\
" IdentitiesOnly yes\n"\
" UserKnownHostsFile=/dev/null\n"\
" StrictHostKeyChecking no"\
>> ~/.ssh/config
# Unfortunately this does not seem to work.
#export GIT_SSH_COMMAND="ssh -i ~/.ssh/deploy-key"
# The vanilla python buildpack can now install all the dependencies in
# requirement.txt

Create a private PyPI server
If you create your own PyPI server, you can simply list your packages in your requirements.txt file and then store the url for your server (including username and password) in the config variable, PIP_EXTRA_INDEX_URL.
For example:
heroku config:set PIP_EXTRA_INDEX_URL='https://username:password#privateserveraddress.com/simple'
Note that this is the same as using the pip install command line option, --extra-index-url. (See https://pip.pypa.io/en/stable/user_guide/#environment-variables)
The primary index url will still be the default (https://pypi.org/simple). This means that pip will first attempt to resolve package names in your requirements file at the default PyPI server, and then try your private server second.
If you need packages in your private server that have the same name as packages in PyPI, then you need the primary index url to be your server and the --extra-index-url option to be the default server's url. You would need to do this if you want to host your own version of an existing package without changing the package name. I haven't tried this, but it currently looks like you would need to to create a fork of heroku's official python buildpack and make a small change to the bin/steps/pip-install file.
The reason pip has access to the PIP_EXTRA_INDEX_URL is because of this block in that file:
# Set Pip env vars
# This reads certain environment variables set on the Heroku app config
# and makes them accessible to the pip install process.
#
# PIP_EXTRA_INDEX_URL allows for an alternate pypi URL to be used.
if [[ -r "$ENV_DIR/PIP_EXTRA_INDEX_URL" ]]; then
PIP_EXTRA_INDEX_URL="$(cat "$ENV_DIR/PIP_EXTRA_INDEX_URL")"
export PIP_EXTRA_INDEX_URL
mcount "buildvar.PIP_EXTRA_INDEX_URL"
fi
Code like this is necessary to read config variables in buildpacks (see https://devcenter.heroku.com/articles/buildpack-api#buildpack-api), but you should be able to simply duplicate this codeblock, replacing PIP_EXTRA_INDEX_URL with PIP_INDEX_URL. Then set PIP_INDEX_URL to your private server's url and PIP_EXTRA_INDEX_URL to the default PyPI url.
If you are using another source instead of a private PyPI server, such as github, and simply need a way to avoid hardcoding a username and password in your requirements.txt file, then also note that you can use environment variables in requirements.txt (see https://pip.pypa.io/en/stable/reference/pip_install/#using-environment-variables). You would just have to export them in bin/steps/pip-install as you would for PIP_INDEX_URL.

You could use a pre-compile step as described here to run something like M4 to do substitutions on your requirements.txt to file in the password from the environment variable.

Related

Installing a Git-hosted module with pip on OpenShift

I have a project with a requirements.txt resembling this:
-e git+https://some.gitlab.com/some_group/some_repo#egg=repo
selenium
pywinauto
I made a source secret on OpenShift with my username and password and started the build. Cloning the project goes through, but cloning some_repo fails with an Error: "Can't find Username".
I'm a bit confused because the main project was successfully cloned with the credentials provided in the secret, but it doesn't seem like Pip is reusing those.
What's more confusing is that OpenShift seems to store the credentials in a .gitconfig file, that should be known to Pip:
I0107 15:35:14.756570 1 password.go:84] Adding username/password credentials to git config:
# credential git config
[credential]
helper = store --file=/tmp/gitcredentials.324456941
Any idea ?
P.S. I wanted to try with an SSHKey but for some reason the admins doesn't want to enable this option on the company's GitLab. And I don't want to put some credentials in the url inside the requirements.txt.
Edit : I have no problem with this on my workstation

pip expects you to add the username and password as a part of the URL if you are not using ssh keys. You could set the secrets as environment variables and refer to them in your pip.conf.
[global]
index = https://$username:$password#some.gitlab.com/some_group/some_repo

Python Development in multiple repositories

We are trying to find the best way to approach that problem.
Say I work in a Python environment, with pip & setuptools.
I work in a normal git flow, or so I hope.
So:
Move to feature branch in some app, make changes.
Move to feature branch in a dependent lib - Develop thing.
Point the app, using "-e git+ssh" to the feature branch of the dependent lib.
Create a Pull Request.
When this is all done, I want to merge stuff to master, but I can't without making yet another final change to have the app (step 3 above) requirements.txt now point to the main branch of the feature.
Is there any good workflow for "micro services" or multiple dependent source codes in python that we are missing?

Python application workflow from development to deployment
It looks like you are in search for developing Python application, using git.
Following description is applicable to any kind of Python based application,
not only to Pyramid based web ones.
Requirements
Situation:
developing Python based solution using Pyramid web framework
there are multiple python packages, participating in final solution, packages might be dependent.
some packages come from public pypi, others might be private ones
source code controlled by git
Expectation:
proposed working style shall allow:
pull requests
shall work for situations, where packages are dependent
make sure, deployments are repeatable
Proposed solution
Concepts:
even the Pyramid application released as versioned package
for private pypi use devpi-server incl. volatile and release indexes.
for package creation, use pbr
use tox for package unit testing
test, before you release new package version
test, before you deploy
keep deployment configuration separate form application package
Pyramid web app as a package
Pyramid allows creation of applications in form of Python package. In
fact, whole initial tutorial (containing 21 stages) is using exactly this
approach.
Despite the fact, you can run the application in develop mode, you do not have
to do so in production. Running from released package is easy.
Pyramid uses nice .ini configuration files. Keep development.ini in the
package repository, as it is integral part for development.
On the other hand, make sure, production .ini files are not present as they
should not mix with application and belong to deployment stuff.
To make deployment easier, add into your package a command, which prints to
stdout typical deployment configuration. Name the script e.g. myapp_gen_ini.
Write unittests and configure tox.ini to run them.
Keep deployment stuff separate from application
Mixing application code with deployment configurations will make problem at
the moment, you will have to install to second instance (as you are likely to
change at least one line of your configuration).
In deployment repository:
keep here requirements.txt, which lists the application package and other
packages needed for production. Be sure you specify exact package version at
least for your application package.
keep here production.ini file. If you have more deployments, use one branch per deployment.
put here tox.ini
tox.ini shall have following content:
[tox]
envlist = py27
# use py34 or others, if your prefer
[testenv]
commands =
deps =
-rrequirements.txt
Expected use of deployment respository is:
clone it to the server
run tox, this will create virtualenv .tox/py27
activate the virtualenv by $ source .tox/py27/bin/activate
if production.ini does not exist in the repo yet, run command
$ myapp_gen_ini > production.ini to generate template for production
configuration
edit the production.ini as needed.
test, it works.
commit the production.ini changes to the repository
do other stuff needed to deploy the app (configure web server, supervisord etc.)
For setup.py use pbr package
To make package creation simpler, and to keep package versioning related to git
repository tags, use pbr. You will end up with setup.py being only 3 lines
long and all relevant stuff will be specified in setup.cfg in form of ini
file.
Before you build the first time, you have to have some files in git repository,
otherwise it will complain. As you use git, this shall be no problem.
To assign new package version, set $ git tag -a 0.2.0 and build it. This will
create the package with version 0.2.0.
As a bonus, it will create AUTHORS and ChangeLog based on your commit
messages. Keep these files in .gitignore and use them to create AUTHORS.rst
and ChangeLog.rst manually (based on autogenerated content).
When you push your commits to another git repository, do not forget to push the tags too.
Use devpi-server as private pypi
devpi-server is excellent private pypi, which will bring you following advantages:
having private pypi at all
cached public pypi packages
faster builds of virtual environments (as it will install from cached packages)
being able to use pip even without having internet connectivity
pushing between various types of package indexes: one for development
(published version can change here), one for deployment (released version will not change here).
simple unit test run for anyone having access to it, and it will even collect
the results and make them visible via web page.
For described workflow it will contribute as repository of python packages, which can be deployed.
Command to use will be:
$ devpi upload to upload developed package to the server
$ devpi test <package_name> to download, install, run unit test,
publish test results to devpi-server and clean up temporary installation.
$ devpi push ... to push released package to proper index on devpi-server or even on public pypi.
Note, that all the time it is easy to have pip command configured to consume
packages from selected index on devpi server for $ pip install <package>.
devpi-server is also ready for use in continuous integration testing.
How git fits into this workflow
Described workflow is not bound to particular style of using git.
On the other hand, git can play it's role in following situations:
commit: commit message will be part of autogenerated ChangeLog
tag: defines versions (recognized by setup.py based on pbr).
As git is distributed, having multiple repositories, branches etc.,
devpi-server allows similar distribution as each user can have it's own
working index to publish to. Anyway, finally there will be one git repository
with master branch to use. In devpi-server will be also one agreed
production index.
Summary
Described process is not simple, but the complexity is relevant to complexity of the task.
It is based on tools:
tox
devpi-server
pbr (Python package)
git
Proposed solution allows:
managing python packages incl. release management
unit testing and continuous integration testing
any style of using git
deployment and development having clearly defined scopes and interactions.
Your question assumes multiple repositories. Proposed solution allows decoupling multiple repositories by means of well managed package versions, published to devpi-server.

We ended up using git dependencies and not devpi.
I think when git is used, there is no need to add another package repository as long as pip can use this.
The core issue, where the branch code (because of a second level dependency) is different from the one merged to master is not solved yet, instead we work around that by working to remove that second level dependency.

What is the Best Practice or most efficient way to update custom python modules in pythonanywhere?

For PythonAnywhere:
I am currently building a project where I have to change one of my installed packages frequently (because I am adding to the package as I build out the project). It is very manual and laborious to constantly update the package in the BASH console be reinstalling the package everytime I make a change locally. Is there a better process for this?

It sounds like you want to be able to use a single command from your local machine to push up some changes to PythonAnywhere, one way to go about it would be to use PythonAnywere as a git remote. There's some details in this post, but, broadly:
username#PythonAnywhere:~$ mkdir my_repo.git
username#PythonAnywhere:~$ cd my_repo.git
username#PythonAnywhere:~$ git init --bare
Then, on your PC:
git remote add pythonanywhere username#ssh.pythonanywhere.com:my_repo.git
Then you should be able to push to the bare repository on PA from your machine with a
git push pythonanywhere master
You can then use a Git post-receive hook to update the package on PythonAnywhere, by whatever means you like. One might be to have your package checked out on PythonAnywhere:
username#PythonAnywhere:~$ git clone my_package ./my_repo.git
And then the post-receive hook could be as simple as
cd ~/my_package && git pull

python | heroku | how to access packages over ssh

Hi heroku python people,
I want my heroku app to access shared private libraries in my github account.
So I would like to have a requirements.txt file that looks like this ...
# requirements.txt
requests==1.2.2
-e git+ssh://git#github.com/jtushman/dict_digger.git#egg=dict_digger
And I would like it to use a ssh key that I upload with heroku keys:add or have some mechanism to get a private key from the heroku cli.
Right now I get the following error (which is I guess expected):
Host key verification failed.
It does work if I do (per #kenneth_reitz's https://stackoverflow.com/a/9136665/192791):
-e git+https://username:password#github.com/jtushman/dict_digger.git#egg=dict_digger
But it is really unworkable for me to put credentials in my requirements.txt file
Has anyone come up with a nice solution for this?
I have also posted an issue on the heroku python buildpack project here

Kenneth, the maintainer of heroku's python buildpack said the following (and I am cutting and pasting here)
I would currently recommend the way mentioned (git over https)
Using the key you have registered with heroku would be cool, but
unfortunately, you would have to provide your private key for this to
work. Quite undesirable.
However, you could also write your keys into a .ssh folder in your app
or use .profile scripts to facilitate this.
Can see the full thread here: https://github.com/heroku/heroku-buildpack-python/issues/97

I had the same issue before I wanted to use django-avatar and the version in PyPI is old and doesn't support Django 1.5 Custom User .
The simple solution is to download the package and use it as a regular app as if it was part of your project then just git add . and push it and it works !
It might not be the best idea but it just works .

How to setup Git to deploy python app files into Ubuntu Server?

I setup a new Ubuntu 12.10 Server on VPN hosting. I have installed all the required setup like Nginx, Python, MySQL etc. I am configuring this to deploy a Flask + Python app using uWSGI. Its working fine.
But to create a basic app i used Putty tool (from Windows) and created required app .py files.
But I want to setup a Git functionality so that i can push my code to required directory say /var/www/mysite.com/app_data so that i don't have to use SSH or FileZilla etc everytime i make some changes into my website.
Since i use both Ubuntu & Windows for development of app, setting up a Git kind of functionality would help me push or change my data easily to my Cloud Server.
How can i setup a Git functionality in Ubuntu ? and How could i access it and Deploy data using tools like GitBash etc. ?
Please Suggest

Modified version of innaM:
Concept
Have three repositories
devel - development on your local development machine
central - repository server - like GitHub, Bitbucket or anything other
prod - production server
Then you commit things from devel to central and as soon as you want to deploy on prod, than you ask prod to pull data from prod.
"asking" prod server to pull the updates can be managed by cron (then you have to wait a moment) or you may use other means like one shot call of ssh asking to do git pull and possibly restart your app.
Step by step
In more details you can go this way.
Prepare repo on devel
Develop and test the app on your devel server.
Put it into local repository:
$ git init
$ git add *
$ git commit -m "initial commit"
Create repo on central server
E.g. bitbucket provides this description: https://confluence.atlassian.com/display/BITBUCKET/Import+code+from+an+existing+project
Generally, you create the project on Bitbucket, find the url of it and then from your devel repo call:
$ git remote add origin <bitbucket-repo-url>
$ git push origin
Clone central repo to prod server
Log onto your prod server.
Go to /var/www and clone form bitucket:
$ cd /var/www
$ git clone <bitbucket-repo-url>
$ cd mysite.com
and you shall have your directory ready.
Trigger publication of updates to prod3
There are numerous options. One being a cron task, which would regularly call
$ git pull
In case, your app needs restart afte an update, then you have to ensure, the restart would happen (this shall be possible using git log command, which will show new line after the update, or you may check, if status code would tell you.
Personally I would use "one shot ssh" (you asked not to use ssh, but I assume you are asking for "simpler" solution, so one shot call shall work simpler then using ftp, scp or other magic.
From your devel machine (assuming you have ssh access there):
$ ssh user#prod.server.com "cd /var/www/mysite.com && git pull origin && myapp restart"
Advantage is, that you do control the moment, the update happens.
Discussion
I use similar workflow.
rsync seems in many cases serve well enough or better (be aware of files being created at app runtime and by files in your app, which shall be removed during ongoing versions and shall be removed on server too).
salt (saltstack) could serve too, but requires a bit more learning and setup).
I have learned, that keeping source code and configuration data in the same repo makes sometime situation more dificult (that is why I am working on using salt).
fab command from Fabric (python based) may be best option (in case installation on Windows becomes difficult, look at http://ridingpython.blogspot.cz/2011/07/installing-fabric-on-windows.html

Create a bare repository on your server.
Configure your local repository to use the repository on the server as a remote.
When working on your local workstation, commmit your changes and push them to the repository on your server.
Create a post-receive hook in the server repository that calls "git archive" and thus transfers your files to some other directory on the server.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.