Git and shared python library - python

This might be a novice question, so please excuse.
We have a small python development team and our repo is organized as indicated below. We have a custom library that is shared across multiple scripts (wrappers) and then libraries specific to each wrapper. The below structure is maintained under Git. This has worked so far for development. Now we would like to release the wrappers, but individually. As in, we need to release wrapper1 targeting a separate audience (different timelines and needs) and at a later time wrapper2. Both need to contain the shared_library and only their specific library. What is the best way to do this?
repo/:
wrapper1.py
wrapper2.py
shared_library/:
module1.py
module2.py
wrapper1_specific_lib/:
wrapper1_module1.py
wrapper1_module2.py
wrapper2_specific_lib/:
wrapper2_module1.py
wrapper2_module2.py
We have considered the following solutions:
Re-org as three separate repos: wrapper1, wrapper2 and shared_library and release separately.
Have two separate repos for wrapper1 and wrapper2 and periodically sync the shared library (!!?!!)
Leave it like this but explore if it is possible to use Git in some way to release selected files, folders specific to each wrapper.
Seeking you help on python code-organization for better release management using Git. Thanks in advance!!

You have 3 products, each with its own release timeline. Wrapper1 users are not interested in seeing wrapper2 code, and vice versa. Breaking out into 3 repos would be the simplest approach.
Do take care to package automated unit tests into the shared_library repo. Both of the dependent apps should be able to successfully run their own tests plus the shared tests. This will become important as the released apps submit new feature requests and try to pull the shared_library in different directions.

Related

Enforcing use of a package management system when developing with a team

I work on a large python code base with several teammates. We are often installing or updating dependencies on other python packages, and inevitably this causes problems when someone else updates their master branch from git, or we deploy on a new system.
I've seen many tools available for deploying environments on new computers, which are great. The problem is that these tools only work if everyone is consistently updating the relevant files (e.g. requirements.txt, setup.py, tarballs on a PyPI server...) every time they update or add a package.
We use Github's pull request system for code reviews. What would be great would be some means of indicating to the reviewer that the dependency structure has changed, prompting the reviewer to check for the necessary updates (also good would be to build in a checklist that the reviewer has to complete, reminding them to do the check).
How have other folks dealt with this problem?
I would enforce the use of tools with a network proxy or network ACL, to block public sites and stand up internal services like gitlab, bitbucket, GitHub enterprise, or internal pypi server, to force the use of certain standards.

Using GitHub's Release functionality with a client and server project single repo

I have a single github repository for a client and server project I'm working on. I'm maintaining versions separately for both project components and will release them at different times and intervals. Different server versions have backwards compatibility.
Both components are written in python.
I would like to use github's releases interface (and API) to release versions for both components separately, and not quite sure how this should work. I googled for common practices and searched github's help and couldn't find anything that addresses my question.
I would also like to use github's releases API as an update mechanism (by pulling for the list of releases and comparing that to a local version) for at least the client component.
Any help or direction will be appreciated!
My question is how should I create the releases themselves? should I name them differently based on component? should I use the version field by including the component in it? Any other way?

How to publish generic builds via python to an artifact repository?

I am looking for a flexible solution for uploading generic builds to an artifact repository (in my case it would be Artifactory but I would not mind if it would also support others, like Nexus)
Because I am not building java code adding maven to the process would only add some unneeded complexity to the game.
Still, the entire infrastructure already supports bash and python everywhere (including Windows) which makes me interested on finding something that involves those two.
I do know that I could code it myself, but now I am looking for a way to make it as easy and flexible as possible.
Gathering the metadata seems simple, only publishing it in the format required by the artefact repository seems to be the issue.
After discovering that the two existing Python packages related to Artifactory are kinda useless as both not being actively maintained, one being only usable as a query interface and the other two having serious bugs that prevent it use, I discovered something than seems to close that what I was looking: http://teamfruit.github.io/defend_against_fruit/
Still, it seems that was designed to deal only with python packages, not with generic builds.
Some points to consider:
Tools like Maven and Gradle are capable of building more than Java projects. Artifactory already integrates with them and this includes gathering the metadata and publishing it together with the build artifacts.
The Artifactory Jenkins plugin supports generic (freestyle) builds. You can use this integration to deploy whatever type of files you like.
You can create your own integration based on the Artifactory's open integration layer for CI build servers - build-info. This is an open source project and all the implementations are also open sourced.The relevant Artifactory REST APIs are documented here.
Disclaimer: I'm affiliated with Artifactory

How do you distribute Python scripts?

I have a server which executes Python scripts from a certain directory path. Incidently this path is a check-out from the SVN trunk version of the scripts. However, I get the feeling that this isn't the right way to provide and update scripts for a server.
Do you suggest other approaches? (compile, copy, package, ant etc.)
In the end a web server will execute some Python script with parameters. How do I do the update process?
Also, I have trouble deciding what is best to handle updated versions which only work for new projects on the server. Therefore, if I update the Python scripts, but only newly created web jobs will know how to handle that. I "delivery" to one of many directories which keep track of versions and the server picks the right one?!
EDIT: I webserver is basically an interface that runs some data analysis. That analysis is the actual scripts that take some parameters and mingle data. I don't really change the web interface. I only need to update the data scripts stored on webserver. Indeed, in some advanced version the web server should also pick the right version of my data scripts. However, at the moment I have no idea which would be the easiest way.
The canonical way of distributing Python code/functionality is by using a PyPi compliant package manager.
A list of available PyPi implementations on python.org:
http://wiki.python.org/moin/PyPiImplementations
Instructions on setting up and using EggBasket:
http://chrisarndt.de/projects/eggbasket/#installation
Instructions on installing ChiShop:
http://justcramer.com/2011/04/04/setting-up-your-own-pypi-server/
Note that for this to work you need to distribute your code as "Eggs"; you can find out how to do this here: http://peak.telecommunity.com/DevCenter/setuptools
A great blog post on the usage of eggs and the different parts in packaging: http://mxm-mad-science.blogspot.com/2008/02/python-eggs-simple-introduction.html

Python: tool to keep track of deployments

I'm looking for a tool to keep track of "what's running where". We have a bunch of servers, and on each of those a bunch of projects. These projects may be running on a specific version (hg tag/commit nr) and have their requirements at specific versions as well.
Fabric looks like a great start to do the actual deployments by automating the ssh part. However, once a deployment is done there is no overview of what was done.
Before reinventing the wheel I'd like to check here on SO as well (I did my best w/ Google but could be looking for the wrong keywords). Is there any such tool already?
(In practice I'm deploying Django projects, but I'm not sure that's relevant for the question; anything that keeps track of pip/virtualenv installs or server state in general should be fine)
many thanks,
Klaas
==========
EDIT FOR TEMP. SOLUTION
==========
For now, we've chosen to simply store this information in a simple key-value store (in our case: the filesystem) that we take great care to back up (in our case: using a DCVS). We keep track of this store with the same deployment tool that we use to do the actual deploys (in our case: fabric)
Passwords are stored inside a TrueCrypt volume that's stored inside our key-value store.
==========
I will still gladly accept any answer when some kind of Open Source solution to this problem pops up somewhere. I might share (part of) our solution somewhere myself in the near future.
pip freeze gives you a listing of all installed packages. Bonus: if you redirect the output to a file, you can use it as part of your deployment process to install all those packages (pip can programmatically install all packages from the file).
I see you're already using virtualenv. Good. You can run pip freeze -E myvirtualenv > myproject.reqs to generate a dependency file that doubles as a status report of the Python environment.
Perhaps you want something like Opscode Chef.
In their own words:
Chef works by allowing you to write
recipes that describe how you want a
part of your server (such as Apache,
MySQL, or Hadoop) to be configured.
These recipes describe a series of
resources that should be in a
particular state - for example,
packages that should be installed,
services that should be running, or
files that should be written. We then
make sure that each resource is
properly configured, only taking
corrective action when it's
neccessary. The result is a safe,
flexible mechanism for making sure
your servers are always running
exactly how you want them to be.
EDIT: Note Chef is not a Python tool, it is a general purpose tool, written in Ruby (it seems). But it is capable of supporting various "cookbooks", including one for installing/maintaining Python apps.

Categories