Optimizing build time for ReadTheDocs project

Optimizing build time for ReadTheDocs project - python

I am developing a reasonably-sized binary Python library, Parselmouth, which takes some time to build - mainly because I am wrapping an existing program with a large codebase. Consequently, now that I'm trying to set up API documentation, I am running into either the 15 minute time limit or 1 GB memory limit (when I multithread my build, I have some expensive template instantiations and the compiler process gets killed) when building on ReadTheDocs.
However, I have successfully set up Travis CI builds, using ccache to not recompile the large codebase, but only the changed parts of the wrapper code.
I have been thinking about installing from PyPI, but then the versioning gets complicated, and intermediate development builds do not get good API documentation.
So I was wondering: is there a known solution for this kind of case, maybe using the builds from Travis CI?

What I ended up doing to solve this problem was to use BinTray to upload my wheels built on Travis CI. After this built and upload have succeeded, I manually trigger the ReadTheDocs build, which will then install the project with the right Python wheel from BinTray.
For more details, see this commit

Related

python tox, creating rpm virtualenv, as part of ci pipeline, unsure about where in workflow

I'm investigating how Python applications can also use a CI pipeline, but I'm not sure how to create the standard work-flow.
Jenkins is used to do the initial repository clone, and then initiates tox. Basically this is where maven, and/or msbuild, would get dependency packages and build.... which tox does via pip, so all good here.
But now for the confusing part, the last part of the pipeline is creating and uploading packages. Devs would likely upload created packages to a local pip repository, BUT then also possibly create a deployment package. In this case it would need to be an RPM containing a virtualenv of the application. I have made one manually using rpmvenev, but regardless of how its made, how what such a step be added to a tox config? In the case if rpmvenv, it creates its own virtualenv, a self contained command so to speak.

I like going with the Unix philosophy for this problem. Have a tool that does one thing incredibly well, then compose other tools together. Tox is purpose built to run your tests in a bunch of different python environments so using it to then build a deb / rpm / etc for you I feel is a bit of a misuse of that tool. It's probably easier to use tox just to run all your tests then depending on the results have another step in your pipeline deal with building a package for what was just tested.
Jenkins 2.x which is fairly recent at the time of this writing seems to be much better about building pipelines. BuildBot is going through a decent amount of development and already makes it fairly easy to build a good pipeline for this as well.
What we've done at my work is
Buildbot in AWS which receives push notifications from Github on PR's
That kicks off a docker container that pulls in the current code and runs Tox (py.test, flake8, as well as protractor and jasmine tests)
If the tox step comes back clean, kick off a different docker container to build a deb package
Push that deb package up to S3 and let Salt deal with telling those machines to update
That deb package is also just available as a build artifact, similar to what Jenkins 1.x would do. Once we're ready to go to staging, we just take that package and promote it to the staging debian repo manually. Ditto for rolling it to prod.
Tools I've found useful for all this:
Buildbot because it's in Python thus easier for us to work on but Jenkins would work just as well. Regardless, this is the controller for the entire pipeline
Docker because each build should be completely isolated from every other build
Tox the glorious test runner to handle all those details
fpm builds the package. RPM, DEB, tar.gz, whatever. Very configurable and easy to script.
Aptly makes it easy to manage debian repositories and in particular push them up to S3.

How to publish generic builds via python to an artifact repository?

I am looking for a flexible solution for uploading generic builds to an artifact repository (in my case it would be Artifactory but I would not mind if it would also support others, like Nexus)
Because I am not building java code adding maven to the process would only add some unneeded complexity to the game.
Still, the entire infrastructure already supports bash and python everywhere (including Windows) which makes me interested on finding something that involves those two.
I do know that I could code it myself, but now I am looking for a way to make it as easy and flexible as possible.
Gathering the metadata seems simple, only publishing it in the format required by the artefact repository seems to be the issue.
After discovering that the two existing Python packages related to Artifactory are kinda useless as both not being actively maintained, one being only usable as a query interface and the other two having serious bugs that prevent it use, I discovered something than seems to close that what I was looking: http://teamfruit.github.io/defend_against_fruit/
Still, it seems that was designed to deal only with python packages, not with generic builds.

Some points to consider:
Tools like Maven and Gradle are capable of building more than Java projects. Artifactory already integrates with them and this includes gathering the metadata and publishing it together with the build artifacts.
The Artifactory Jenkins plugin supports generic (freestyle) builds. You can use this integration to deploy whatever type of files you like.
You can create your own integration based on the Artifactory's open integration layer for CI build servers - build-info. This is an open source project and all the implementations are also open sourced.The relevant Artifactory REST APIs are documented here.
Disclaimer: I'm affiliated with Artifactory

Python Build Script

What are some best practices for creating a build script for a Python project? Specifically, not for building a Python library, but a Python application (e.g. standalone server app or web app). What are some frameworks out there that support:
Managing dependencies for different environments (dev, test, prod, etc.)
Running tasks: start, stop, test, pydoc, pep8, etc.
Preparing for deployment with a virtualenv: creating a package (tar.gz, rpm, egg, etc.)
I usually use setuputils/easy_install for doing this -- which has it's limitations. However, I read an article saying one should use distutils/pip. Which if these is more robust? Are there any other choices?
Thank you

If you're looking for a good building framework, Scons which is python based is a good make substitute.

What are the reasons for not hosting a compiler on a live server?

Where I currently work we've had a small debate about deploying our Python code to the production servers. I voted to build binary dependencies (like the python mysql drivers) on the server itself, just using pip install -r requirements.txt. This was quickly vetoed with no better explanation that "we don't put compilers on the live servers". As a result our deployment process is becoming convoluted and over-engineered simply to avoid this compilation step.
My question is this: What's the reason these days to avoid having a compiler on live servers?

In general, the prevailing wisdom on servers installs is that they should be as stripped-down as possible. There are a few motivations for this, but they don't really apply all that directly to your question about a compiler:
Minimize resource usage. GCC might take up a little extra disk space, but probably not enough to matter - and it won't be running most of the time, so CPU/memory usage isn't a big concern.
Minimize complexity. Building on your server might add a few more failure modes to your build process (if you build elsewhere, then at least you will notice something wrong before you go mess with your production server), but otherwise, it won't get in the way.
Minimize attack surface. As others have pointed out, by the time an attacker can make use of a compiler, you're probably already screwed..
At my company, we generally don't care too much if compilers are installed on our servers, but we also never run pip on our servers, for a rather different reason. We're not so concerned about where packages are built, but when and how they are downloaded.
The particularly paranoid among us will take note that pip (and easy_install) will happily install packages from PYPI without any form of authentication (no SSL, no package signatures, ...). Further, many of these aren't actually hosted on PYPI; pip and easy_install follow redirects. So, there are two problems here:
If pypi - or any of the other sites on which your dependencies are hosted - goes down, then your build process will fail
If an attacker somehow manages to perform a man-in-the-middle attack against your server as it's attempting to download a dependency package, then he'll be able to insert malicious code into the download
So, we download packages when we first add a dependency, do our best to make sure the source is genuine (this is not foolproof), and add them into our own version-control system. We do actually build our packages on a separate build server, but this is less crucial; we simply find it useful to have a binary package we can quickly deploy to multiple instances.

I would suggest to refer to this serverfault post.
It makes sense to avoid exploits being compiled remotely
It makes sense also to me that in terms of security, it will only make the task harder for a hijacker without than with a compiler, but it's not perfect.

It would put a heavy strain on the server?

Python: tool to keep track of deployments

I'm looking for a tool to keep track of "what's running where". We have a bunch of servers, and on each of those a bunch of projects. These projects may be running on a specific version (hg tag/commit nr) and have their requirements at specific versions as well.
Fabric looks like a great start to do the actual deployments by automating the ssh part. However, once a deployment is done there is no overview of what was done.
Before reinventing the wheel I'd like to check here on SO as well (I did my best w/ Google but could be looking for the wrong keywords). Is there any such tool already?
(In practice I'm deploying Django projects, but I'm not sure that's relevant for the question; anything that keeps track of pip/virtualenv installs or server state in general should be fine)
many thanks,
Klaas
==========
EDIT FOR TEMP. SOLUTION
==========
For now, we've chosen to simply store this information in a simple key-value store (in our case: the filesystem) that we take great care to back up (in our case: using a DCVS). We keep track of this store with the same deployment tool that we use to do the actual deploys (in our case: fabric)
Passwords are stored inside a TrueCrypt volume that's stored inside our key-value store.
==========
I will still gladly accept any answer when some kind of Open Source solution to this problem pops up somewhere. I might share (part of) our solution somewhere myself in the near future.

pip freeze gives you a listing of all installed packages. Bonus: if you redirect the output to a file, you can use it as part of your deployment process to install all those packages (pip can programmatically install all packages from the file).
I see you're already using virtualenv. Good. You can run pip freeze -E myvirtualenv > myproject.reqs to generate a dependency file that doubles as a status report of the Python environment.

Perhaps you want something like Opscode Chef.
In their own words:
Chef works by allowing you to write
recipes that describe how you want a
part of your server (such as Apache,
MySQL, or Hadoop) to be configured.
These recipes describe a series of
resources that should be in a
particular state - for example,
packages that should be installed,
services that should be running, or
files that should be written. We then
make sure that each resource is
properly configured, only taking
corrective action when it's
neccessary. The result is a safe,
flexible mechanism for making sure
your servers are always running
exactly how you want them to be.
EDIT: Note Chef is not a Python tool, it is a general purpose tool, written in Ruby (it seems). But it is capable of supporting various "cookbooks", including one for installing/maintaining Python apps.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.