Python distibution package based on local git commit - python

I am trying to create a python distribution package following https://packaging.python.org/en/latest/tutorials/packaging-projects/. My source folder contains many irrelevant files and subfolders which should be excluded from the distribution, such as temporary files, auxiliary code, test output files, private notes, etc. The knowledge of which files are relevant and which not, is already represented in the git source control. Instead of replicating all this as inclusion/exclusion rules in the pyproject configuration file (and needing to keep it up to date going forward), I would like my build chain to be based on a git commit/tag. This will also be useful for keeping versioning in sync between my repo history and pypi. I know there is an option to do it with github actions, but my question is how to do it locally, based just on git rather than github.
Edit following comment: I agree that you don't always want the repo and distro trees to be the same, but it would be much simpler to control if the distro starts from the repo tree as a baseline, with a few additional exclusion rules on top of that.

To automatically include files from a Git or Mercurial repository you can use setuptools_scm. The tool can also automatically set the software version from a repository tag and the amount of changes since the tag.
The tool prepares data for the standard setuptools.

Related

Correct way to develop a python package at same time than application

I'm developing an application, it is in an early stage.
At the same time I've started another project that will need some files form the first one, so now the point is to extract that files to a package that I will use from both projects.
Both projects are under git. so "linking" doesn't look a good idea.
The common functions are in an early stage, that means a lot of changes in the near future.
I think that the best solution is to extract common code to a new repository as package but I don't know what is the most productive way to do that.
If I do a package and install it, every change will need an installation, so debugging could be so tedious.
Which is most common or recommended way to do that?
You can use Git Submodules for this purpose. You include your library inside your main project.
git submodule add git#guthub.com:your-library.git
This command creates .gitmodule with your confuguration and adds your-library folder with library's code. You can commit changes to your-library just from this new folder.
cd your-library
touch new-file.txt
git add .
git commit -m "changes"
git push
Also you can pull changes for library
cd your-library
git pull

How to keep runtime and development `requirements_*.txt` up to date?

I would like to keep multiple requirements*.txt files up to date while working on a project. Some packages my project depends on are required at runtime, while others are required during development only. Since these packages may have their own dependencies as well, it is hard to tell which dependency should be in which requirements*.txt file.
If I would like to keep track of the runtime dependencies in requirements_prod.txt and of the development dependencies in requirements_dev.txt, how should I keep both files up to date and clean if I add packages during development? Running a mere pip freeze > requirements_prod.txt would list all installed dependencies, including those only needed for development. This would pollute either of the requirements_*.txt files.
Ideally, I would like to mark a package on installation as 'development' or 'runtime' and have it (and its own dependencies) written to the correct requirements_*.txt.
Edit:
#Brian: My question is slightly different from this question because I would like to have my requirements_*.txt files to stay side by side in the same branch, not in different branches. So my requirements_*.txt should always be in the same commits.
Brian's answer clarifies things a lot for me:
Usually you only want to add direct dependencies to your requirements file.
(...) Both of those files should be maintained manually
So instead of generating the requirements_*.txt file automatically using pip freeze, they should be maintained manually and only need to contain direct dependencies.

Where to place example data in python package?

I'm setting up my first Python package, and I want to install it with some example data so that users can run the code straight off. In case it's relevant my package is on github and I'm using pip.
At the moment my example data is being installed with the rest of the package into site_packages/, by setting include_package_data=True in setup.py, and referencing the files I want to include in MANIFEST.in. However, while this makes sense to me for files used by the code as part of its processing, it doesn't seem especially appropriate for example data.
What is best/standard practice for deploying example data with a python package?
You can put your example data in the repository in examples folder next to your project sources and exclude it from package with prune examples in your manifest file.
There is actually no universal and standard advice for that. Do whatever suits your needs.

Git and packaging: common pattern to keep packaging-files out of the way?

For the last python project I developed, I used git as versioning system. Now it's that time of the development cycle in which I should begin to ship out packages to the beta testers (in my case they would be .deb packages).
In order to build my packages I need a number of extra files (copyright, icon.xpm. setup.py, setup.cfg, stdeb.cfg, etc...) but I would like to keep them separate from the source of the program, as the source might be used to prepare packages for other platforms, and it would make no sense having those debian-specific files lingering around.
My question: is there a standard way/best practice to do so? In my google wanderings I stumbled a couple of times (including here on SO) on the git-buildpackage suite, but I am not sure this is what I am looking for, as it seems that is thought for packagers that download a tar.gz from an upstream repository.
I thought a possible way to achive what I want would be to have a branch on the git repository where I keep my packaging-files, but this branch should also be able to "see" the files on the master branch without me having every time to manually merge the master into the packaging branch. However:
I don't know if this is a good idea / the way it should be done
Although I suspect it might involve some git symbolic-ref magic, I have no idea how to do what I imagined
Any help appreciated, thanks in advance for your time!
why would you want to keep those out of the source control? They are part of the inputs to generate the final, built output! You definately don't want to lose track of them, and you probably want to keep track of how they change over time, as you continue to develop your application.
What you most likely want to do is create a subdirectory for all of these distribution specific files to live, say ./debian or ./packaging/debian and commit them there; You can have a makefile or some such that, when you run it in that directory, copies all of the files where they need to be to create the package, and you'll be in great shape!
At the end I settled for a branch with a makefile in it, so that my packaging procedure now looks something like:
git checkout debian-packaging
make get-source
make deb
<copy-my-package-out-of-the-way-here>
make reset
If you are interested you can find the full makefile here (disclaimer: that is my first makefile ever, so it's quite possible it's not the best makefile you will ever see).
In a nutshell, the core of the "trick" is in the get-source directive, and it is the use of the git archive command, that accepts the name of a branch as an argument and produces a tarball with the source from that branch. Here's the snippet:
# Fetch the source code from desired branch
get-source:
git archive $(SOURCE_BRANCH) -o $(SOURCE_BRANCH).tar
tar xf $(SOURCE_BRANCH).tar
rm $(SOURCE_BRANCH).tar
#echo "The source code has been fetched."
Hope this helps somebody else too!
You can have the extra files in a separate repo ( so that they are versioned too) and use submodules to use it in your source code repo.

How to access the original tarball when packaging for Debian?

I am packaging a piece of Python software that uses DistUtilsExtra. When running python setup.py install in my debian/rules, DistUtilsExtra automatically recompiles the translation template .pot file and updates it directly in the source repository. As a result of that, the second time I execute the packaging commands (be it with debuild or pdebuild) an automatic patch file gets created (since it thinks I have manually updated the .pot file). This patch is obviously unwanted in the Debian package and I am searching for a way to not generate it.
One solution would be for DistUtilsExtra to not change the .pot file in the source repository, but for now that's not possible. I am thus testing another solution: create an override for the clean instruction that extracts the original .pot file from the .orig.tar.gz tarball, done like this in debian/rules:
override_dh_clean:
tar -zxvf ../<projname>_*.orig.tar.gz --wildcards --strip-components=1 <projname>-*/po/<projname>.pot
dh_clean
However I've been told on the debian-mentors mailing list that the original tarball is not assured to be located in ../. I am thus wondering if there is a way to reliably access the .orig.tar.gz tarball from inside debian/rules, like a "variable" that would contain its location?
This is not strictly speaking an answer to the question How to access the original tarball when packaging for Debian?, but that's how I solved the problem that provoqued my question, so here it is:
I found an interesting blog post by Raphaƫl Hertzog that explains how to ignore autogenerated files when building a Debian package. This is done by passing the --extend-diff-ignore option to dpkg-source in the debian/source/options file. I have thus removed the proposed command from override_dh_clean, and the unwanted automatic patch is not created anymore.
The usual solution for automatically generated files is to delete them during clean.

Categories