Combine multiple git project forks of python algorithms into one - python

I am currently working on a project paper, where for the practical part I have one core algorithm and multiple side algorithms that kind of supplement the working of the core algorithm. They however never have to run at the same time, as one calculated stuff with those side algorithms and the core algorithm uses this precumpted data afterwards. For each of them there is a git repo I forked from and did some modifications. Also, and that is really important, all of them use some shared python libraries I wrote. At the moment, I just copy them into each repo by hand, which means that if I find a bug or do some changes, I have to manually copy and paste them into the other projects.
So my project structure at the moment is as follows
Repo1/
folder/
shared.py
Repo2/
folder/
shared.py
Repo3/
folder/
shared.py
Now my questions:
1. Is there a better way to share those python libraries between the projects.
2. What would be the proper git way to merge those forks into one, new repository? I thought about submodules, but does this really make sense if I'm not interested in new commits released in those repos because I already diverged too far from them?

Submodule (that you considered) remains the practical solution for your shared library: that means "folder" becomes a root folder for a nested repository with your shared library in it.
Once you fix something in that shared library on one of those projects, you can commit and push, then, on each other projects:
git submodule update --remote
They would benefit from the fix immediatly.

Related

Correct way to develop a python package at same time than application

I'm developing an application, it is in an early stage.
At the same time I've started another project that will need some files form the first one, so now the point is to extract that files to a package that I will use from both projects.
Both projects are under git. so "linking" doesn't look a good idea.
The common functions are in an early stage, that means a lot of changes in the near future.
I think that the best solution is to extract common code to a new repository as package but I don't know what is the most productive way to do that.
If I do a package and install it, every change will need an installation, so debugging could be so tedious.
Which is most common or recommended way to do that?
You can use Git Submodules for this purpose. You include your library inside your main project.
git submodule add git#guthub.com:your-library.git
This command creates .gitmodule with your confuguration and adds your-library folder with library's code. You can commit changes to your-library just from this new folder.
cd your-library
touch new-file.txt
git add .
git commit -m "changes"
git push
Also you can pull changes for library
cd your-library
git pull

maintaining Package vs Sub module

I am working on a big project which includes a number of different repositories.
After some discussions we decided that we might need to create packages from a number of repos.
The problem is that we have two different repositories using the same submodule.
Repository named Control-1 and repository named Control-2 uses a submodule named HW-control.
HW-control is a core repository used by theses two other repos and the dependency goes two directions.
When testing or developing new features of Control-1 or Control-2 we are using HW-control functionalities,
but at the same time when we are developing or testing features of HW-control we need to use control-1 or control-2 in order to run the tests.
The dilemma is if we make the HW-control a package, on every new feature we develop, we need to release a new package version, update it on our environments and then run the tests with Control-1 or Control-2.
Or, edit the package files directly and from there I can see the mess it will create.
And on the other hand, it will be much easier to maintain releases and use it as a packge rather than a submodule.
So the question is, should we make HW-control a package? or should we keep it as a submodule?
what is the correct way or the better option of doing it right?
Thank you all in advanced! Will appreciate any help!

Sharing util modules between actively developed apps

We have a growing library of apps depending on a set of common util modules. We'd like to:
share the same utils codebase between all projects
allow utils to be extended (and fixed!) by developers working on any project
have this be reasonably simple to use for devs (i.e. not a big disruption to workflow)
cross-platform (no diffs for devs on Macs/Win/Linux)
We currently do this "manually", with the utils versioned as part of each app. This has its benefits, but is also quite painful to repeatedly fix bugs across a growing number of codebases.
On the plus side, it's very simple to deal with in terms of workflow - util module is part of each app, so on that side there is zero overhead.
We also considered (fleetingly) using filesystem links or some such (not portable between OS's)
I understand the implications about release testing and breakage, etc. These are less of a problem than the mismatched utils are at the moment.
You can take advantage of Python paths (the paths searched when looking for module to import).
Thus you can create different directory for utils and include it within different repository than the project that use these utils. Then include path to this repository in PYTHONPATH.
This way if you write import mymodule, it will eventually find mymodule in the directory containing utils. So, basically, it will work similarly as it works for standard Python modules.
This way you will have one repository for utils (or separate for each util, if you wish), and separate repositories for other projects, regardless of the version control system you use.
What versioning system are you under? If you are under git, take a look to submodules. The idea in this case is that you would be able to keep a unique, separate repository with the utils, that would be polled into the various project automatically.
I have no direct experience with mercurial, but I believe subrepositories are the equivalent feature.
If you are under SVN... wait... I hope not! :)

Git and packaging: common pattern to keep packaging-files out of the way?

For the last python project I developed, I used git as versioning system. Now it's that time of the development cycle in which I should begin to ship out packages to the beta testers (in my case they would be .deb packages).
In order to build my packages I need a number of extra files (copyright, icon.xpm. setup.py, setup.cfg, stdeb.cfg, etc...) but I would like to keep them separate from the source of the program, as the source might be used to prepare packages for other platforms, and it would make no sense having those debian-specific files lingering around.
My question: is there a standard way/best practice to do so? In my google wanderings I stumbled a couple of times (including here on SO) on the git-buildpackage suite, but I am not sure this is what I am looking for, as it seems that is thought for packagers that download a tar.gz from an upstream repository.
I thought a possible way to achive what I want would be to have a branch on the git repository where I keep my packaging-files, but this branch should also be able to "see" the files on the master branch without me having every time to manually merge the master into the packaging branch. However:
I don't know if this is a good idea / the way it should be done
Although I suspect it might involve some git symbolic-ref magic, I have no idea how to do what I imagined
Any help appreciated, thanks in advance for your time!
why would you want to keep those out of the source control? They are part of the inputs to generate the final, built output! You definately don't want to lose track of them, and you probably want to keep track of how they change over time, as you continue to develop your application.
What you most likely want to do is create a subdirectory for all of these distribution specific files to live, say ./debian or ./packaging/debian and commit them there; You can have a makefile or some such that, when you run it in that directory, copies all of the files where they need to be to create the package, and you'll be in great shape!
At the end I settled for a branch with a makefile in it, so that my packaging procedure now looks something like:
git checkout debian-packaging
make get-source
make deb
<copy-my-package-out-of-the-way-here>
make reset
If you are interested you can find the full makefile here (disclaimer: that is my first makefile ever, so it's quite possible it's not the best makefile you will ever see).
In a nutshell, the core of the "trick" is in the get-source directive, and it is the use of the git archive command, that accepts the name of a branch as an argument and produces a tarball with the source from that branch. Here's the snippet:
# Fetch the source code from desired branch
get-source:
git archive $(SOURCE_BRANCH) -o $(SOURCE_BRANCH).tar
tar xf $(SOURCE_BRANCH).tar
rm $(SOURCE_BRANCH).tar
#echo "The source code has been fetched."
Hope this helps somebody else too!
You can have the extra files in a separate repo ( so that they are versioned too) and use submodules to use it in your source code repo.

Using Mercurial to separate three versions: official/development/testing/

I'm working on deploying a Python module composed of several dozen files and folders; I use Mercurial for managing the software changes.
I want to keep the same module in three branches: the official one (which the team uses), the development one (this may be more than one development branch), and the testing branch (not the testing of the official branch, but a collection of test related to a third party module used by my module - regression testing when the third party module makes new releases).
How can I accomplish this in Mercurial? Simply name three branches in the same folder or cloning one version into three places an maintain them separately?
Any insight on how to manage this in general would be appreciated.
Thank you.
The "official" way would be cloning your repo in as many branch as you need.
But named branches within a repo is also acceptable, especially if you don't need to work simultaneously on different development efforts (each associated to their respective branch)
I find the "Guide to Branching Model in Mercurial" very instructive on this kind of choice.
Other information on Mercurial branches in this SO question as well.

Categories