Caching Python requirements for production deployments - python

I'm building various python-based projects that use pip/buildout to install dependencies. But I don't like the idea of someone deleting a github project and crippling my apps, or a network outage meaning I can't perform a deployment.
How do other people solve this?
I've got various ideas, but I think perhaps the one that sounds most promising would be some kind of caching proxy server. I'd point pip to use this internal proxy server which would cache a copy of the downloaded project, and periodically check for updates (if there's a net connection) before serving cached versions.
Does anything like this already exist?
Use case:
I have a project which I deploy to web server 1. I add new features with a remote dependency, and when I come to update to the production web server, PyPi is down so I can't deploy. Or perhaps when I come to set up a new web server, a dependency has disappeared from github or wherever.
How can I make it so my deployments/dev environments can always be brought up regardless of what happens in the wider world?
Also, when I deploy, I won't deploy over the top of existing code. Rather I'll build a new virtualenv and switch over to it so I can rollback if anything goes wrong. So each time I deploy I'll need to rebuild my environment and will need dependencies to exist.
So I'm looking for a solution that will insulate me against short-term network outages to servers hosting dependencies, as well as guarding against projects being deleted.

You should keep a "reference copy" of the projects on which you depend.
If someone removes the project from GitHub (and PyPi and all the mirrors, and every other site on the net) then you have the source and can now distribute it.

I have exactly the same requirements, and also use buildout to manage my deployments. I try not to install ANY of my package dependencies system-wide; I let buildout install eggs for all of them into my buildout. That way if I depend on a newer version of some package in rev N+1 of my project, and at "go-live" time N+1 falls on its face, I can roll back to N and automatically get the packge dependencies that N worked with.
We run a private eggbasket server, and configure buildout to fetch packages only from that. Server contents were initialized by allowing buildout to grab eggs from the network one time, then copying the downloaded eggs.
This way, upgrades to each package are totally under control and I can ensure that 2 successive buildouts of the same snapshot of my code will build out the same thing. When I want to upgrade all, I will let buildout fetch most-recent-versions again, test test test, then copy my eggs to the eggbasket server to go into production mode.

This is what I'm looking for:
http://pypi.python.org/pypi/collective.eggproxy

Related

How should I deploy a web application to Debian?

Ideally I’d like to build a package to deploy to Debian. Ideally the installation process would check the system has the required dependencies installed, as well as configure Cronjobs, set up users etc.
I’ve tried googling around and I understand a .deb is the format I can distribute in - but that is as far as I got since I’m getting confused now with the tooling I need to get up to speed with. The other option is to just git clone on the server and configure the environment manually… but that’s not preferable for obvious reasons.
How can I get started with building a Debian package and is that the right direction for deploying web applications? If anyone could point me in the right direction tools-wise and perhaps a tutorial that would be massively appreciated :) also if you advise to just take the simple route with git, happy to take that advice as well if you explain why. if it makes any difference I’m deploying one nodejs and one python web application
You can for sure package everything as a Linux application; for example using pyinstaller for your python webapp.
Besides that, it depends on your use case.
I will focus on the second part of your question,
How can I get started with building a Debian package and is that the right direction for deploying web applications?
as that seems to be what you are after when considering other alternatives to .dev already in your question.
I want to deploy 1-2 websites on my linux server
In this case, I'd say manually git clone and configure everything. Its totally fine when you know that there won't be much more running on the server and is pretty hassle free.
Why spend time packaging when noone will need the package ever again after you just installed it on your server?
I want to distribute my webapps to others on Debian
Here a .deb would make total sense. For example Plex media server and other applications are shipped like this.
If the official Debian wiki is too abstract, there are also other more hands on guides to get you started quickly. You could also get other .deb Packages and extract them to see what they are made up from. You mentioned one of your websites is using python, so I just suspect it might be flask or Django. If it's Django, there is an example repository you might want to check out.
I want to run a lot of stuff on my server / distribute to other devs and platforms / or scale soon
In this case I would make the webapps into docker containers. They are easy to build, share, and deploy. On top you can easily bundle all dependencies and scripts to make sure everything is setup right. Also they are easy to run and stop. So you have a simple "on/off" switch if your server is running low on resources while you want to run something else. I highly favour this solution, as it also allows you to easily control what is running on what ip when you deploy more and more applications to your server. But, as you pointed out, it runs with a bit of overhead and is not the best solution on weak hardware.
Also, if you know for sure what will be running on the server long term and don't need the flexibility I would probably skip Docker as well.

Enforcing use of a package management system when developing with a team

I work on a large python code base with several teammates. We are often installing or updating dependencies on other python packages, and inevitably this causes problems when someone else updates their master branch from git, or we deploy on a new system.
I've seen many tools available for deploying environments on new computers, which are great. The problem is that these tools only work if everyone is consistently updating the relevant files (e.g. requirements.txt, setup.py, tarballs on a PyPI server...) every time they update or add a package.
We use Github's pull request system for code reviews. What would be great would be some means of indicating to the reviewer that the dependency structure has changed, prompting the reviewer to check for the necessary updates (also good would be to build in a checklist that the reviewer has to complete, reminding them to do the check).
How have other folks dealt with this problem?
I would enforce the use of tools with a network proxy or network ACL, to block public sites and stand up internal services like gitlab, bitbucket, GitHub enterprise, or internal pypi server, to force the use of certain standards.

Reusable Django apps + Ansible provisioning

I'm a long-time Django developer and have just started using Ansible, after using Vagrant for the last 18 months. Historically I've created a single VM for development of all my projects, and symlinked the reusable Django apps (Python packages) I create, to the site-packages directory.
I've got a working dev box for my latest Django project, but I can't really make changes to my own reusable apps without having to copy those changes back to a Git repo. Here's my ideal scenario:
I checkout all the packages I need to develop as Git submodules within the site I'm working on
I have some way (symlinking or a better method) to tell Ansible to setup the box and install my packages from these Git submodules
I run vagrant up or vagrant provision
It reads requirements.txt and installs the remaining packages (things like South, Pillow, etc), but it skips my set of tools because it knows they're already installed
I hope that makes sense. Basically, imagine I'm developing Django. How do I tell Vagrant (via Ansible I assume) to find my local copy of Django, rather than the one from PyPi?
Currently the only way I can think of doing this is creating individual symlinks for each of those packages I'm developing, but I'm sure there's a more sensible model.
Thanks!
You should probably think of it slightly differently. You create a Vagrant file which specifies Ansible as a provisioner. In that Vagrant file you also specify what playbook to use for your vagrant provision portion.
If your playbooks are written in an idempotent way, running them multiple times will skip steps that already match the desired state.
You should also think about what your desired end-state of a VM should look like and write playbooks to accomplish that. Unless I'm misunderstanding something, all your playbook actions should be happening inside of VM, not directly on your local machine.

Python: tool to keep track of deployments

I'm looking for a tool to keep track of "what's running where". We have a bunch of servers, and on each of those a bunch of projects. These projects may be running on a specific version (hg tag/commit nr) and have their requirements at specific versions as well.
Fabric looks like a great start to do the actual deployments by automating the ssh part. However, once a deployment is done there is no overview of what was done.
Before reinventing the wheel I'd like to check here on SO as well (I did my best w/ Google but could be looking for the wrong keywords). Is there any such tool already?
(In practice I'm deploying Django projects, but I'm not sure that's relevant for the question; anything that keeps track of pip/virtualenv installs or server state in general should be fine)
many thanks,
Klaas
==========
EDIT FOR TEMP. SOLUTION
==========
For now, we've chosen to simply store this information in a simple key-value store (in our case: the filesystem) that we take great care to back up (in our case: using a DCVS). We keep track of this store with the same deployment tool that we use to do the actual deploys (in our case: fabric)
Passwords are stored inside a TrueCrypt volume that's stored inside our key-value store.
==========
I will still gladly accept any answer when some kind of Open Source solution to this problem pops up somewhere. I might share (part of) our solution somewhere myself in the near future.
pip freeze gives you a listing of all installed packages. Bonus: if you redirect the output to a file, you can use it as part of your deployment process to install all those packages (pip can programmatically install all packages from the file).
I see you're already using virtualenv. Good. You can run pip freeze -E myvirtualenv > myproject.reqs to generate a dependency file that doubles as a status report of the Python environment.
Perhaps you want something like Opscode Chef.
In their own words:
Chef works by allowing you to write
recipes that describe how you want a
part of your server (such as Apache,
MySQL, or Hadoop) to be configured.
These recipes describe a series of
resources that should be in a
particular state - for example,
packages that should be installed,
services that should be running, or
files that should be written. We then
make sure that each resource is
properly configured, only taking
corrective action when it's
neccessary. The result is a safe,
flexible mechanism for making sure
your servers are always running
exactly how you want them to be.
EDIT: Note Chef is not a Python tool, it is a general purpose tool, written in Ruby (it seems). But it is capable of supporting various "cookbooks", including one for installing/maintaining Python apps.

Practices while releasing the python/ruby/script based web applications on production

I am purely a windows programmer and spend all my time hacking VC++.
Recently I have been heading several web based applications and myself built applications with python (/pylons framework) and doing projects on rails. All the web projects are hosted on ubuntu linux.
The RELEASE procedures and check list we followed for building and releasing VC++ windows application are merely no more useful when it comes to script based language.
So we don't built any binaries now. I copied asp/php files into IIS folder through ftp server when using open source cms applications.
So FTP is the one of the way to host the files to the web server. Now we feel lazy or not so passionate to copy files via ftp instead we use the SVN checkout and we simply do svn update to get the latest copy.
Is SVN checkout and svn update are the right methods to update the latest build files into the server? Are there any downside in using svn update? Any better method to release the script/web based scripts into the production server?
PS: I have used ssh server at some extension on linux platform.
I would create a branch in SVN for every release of web application and when the release is ready there, I would check it out on the server and set to be run or move it into the place of the old version.
Is SVN checkout and svn update are the right methods to update the latest build files into the server?
Very, very good methods. You know what you got. You can go backwards at any time.
Are there any downside in using svn update? None.
Any better method to release the script/web based scripts into the production server?
What we do.
We do not run out of the SVN checkout directories. The SVN checkout directory is "raw" source sitting on the server.
We use Python's setup.py install to create the application in /opt/app/app-x.y directory tree. Each tagged SVN branch is also a branch in the final installation.
Ruby has gems and other installation tools that are probably similar to Python's.
Our web site's Apache and mod_wsgi configurations refer to a specific /opt/app/app-x.y version. We can then stage a version, do testing, do things like migrate data from production to the next release, and generally get ready.
Then we adjust our Apache and mod_wsgi configuration to use the next version.
Previous versions are all in place. And left in place. We'll delete them some day when they confuse us.
The one downside of doing an svn update on your web root is that the .svn directories can potentially be made public, so be careful about permissions on the web server.
That said, there are far better ways to deploy an app built with dynamic languages. In the Rails world Capistrano is a mature deployment tool, as well as Vlad the Deployer. Capistrano can easily be used for non-rails deployments
There are many deployment strategies based on version control. You could go through the tutorials and get some ideas.
I'd also like to add that even though we do not "build" (compile) the project in dynamic languages, we do "build" (test/integration) them. A serious project would use a continuous integration server to validate the integrated "build" of project on every commit or integration, well before it gets to production.
One downside of doing an svn update: though you can go back in time, to what revision do you go back to? You have to look it up. svn update pseudo-deployments work much cleaner if you use tags - in that case you'd be doing an svn switch to a different tag, not an svn update on the same branch or the trunk.
You want to tag your software with the version number something like 1.1.4 , and then have a simple script to zip it up application-1.1.4,zip, and deploy it - then you have automated repeatable releases and rollbacks as well as greater visibility into what is changing between releases.

Categories