I'm working on deploying a Python module composed of several dozen files and folders; I use Mercurial for managing the software changes.
I want to keep the same module in three branches: the official one (which the team uses), the development one (this may be more than one development branch), and the testing branch (not the testing of the official branch, but a collection of test related to a third party module used by my module - regression testing when the third party module makes new releases).
How can I accomplish this in Mercurial? Simply name three branches in the same folder or cloning one version into three places an maintain them separately?
Any insight on how to manage this in general would be appreciated.
Thank you.
The "official" way would be cloning your repo in as many branch as you need.
But named branches within a repo is also acceptable, especially if you don't need to work simultaneously on different development efforts (each associated to their respective branch)
I find the "Guide to Branching Model in Mercurial" very instructive on this kind of choice.
Other information on Mercurial branches in this SO question as well.
Related
I am working on a big project which includes a number of different repositories.
After some discussions we decided that we might need to create packages from a number of repos.
The problem is that we have two different repositories using the same submodule.
Repository named Control-1 and repository named Control-2 uses a submodule named HW-control.
HW-control is a core repository used by theses two other repos and the dependency goes two directions.
When testing or developing new features of Control-1 or Control-2 we are using HW-control functionalities,
but at the same time when we are developing or testing features of HW-control we need to use control-1 or control-2 in order to run the tests.
The dilemma is if we make the HW-control a package, on every new feature we develop, we need to release a new package version, update it on our environments and then run the tests with Control-1 or Control-2.
Or, edit the package files directly and from there I can see the mess it will create.
And on the other hand, it will be much easier to maintain releases and use it as a packge rather than a submodule.
So the question is, should we make HW-control a package? or should we keep it as a submodule?
what is the correct way or the better option of doing it right?
Thank you all in advanced! Will appreciate any help!
I am currently working on a project paper, where for the practical part I have one core algorithm and multiple side algorithms that kind of supplement the working of the core algorithm. They however never have to run at the same time, as one calculated stuff with those side algorithms and the core algorithm uses this precumpted data afterwards. For each of them there is a git repo I forked from and did some modifications. Also, and that is really important, all of them use some shared python libraries I wrote. At the moment, I just copy them into each repo by hand, which means that if I find a bug or do some changes, I have to manually copy and paste them into the other projects.
So my project structure at the moment is as follows
Repo1/
folder/
shared.py
Repo2/
folder/
shared.py
Repo3/
folder/
shared.py
Now my questions:
1. Is there a better way to share those python libraries between the projects.
2. What would be the proper git way to merge those forks into one, new repository? I thought about submodules, but does this really make sense if I'm not interested in new commits released in those repos because I already diverged too far from them?
Submodule (that you considered) remains the practical solution for your shared library: that means "folder" becomes a root folder for a nested repository with your shared library in it.
Once you fix something in that shared library on one of those projects, you can commit and push, then, on each other projects:
git submodule update --remote
They would benefit from the fix immediatly.
I have a big Python 3.7+ project and I am currently in the process of splitting it into multiple packages that can be installed separately. My initial thought was to have a single Git repository with multiple packages, each with its own setup.py. However, while doing some research on Google, I found people suggesting one repository per package: (e.g., Python - setuptools - working on two dependent packages (in a single repo?)). However, nobody provides a good explanation as to why they prefer such structure.
So, my question are the following:
What are the implications of having multiple packages (each with its own setup.py) on the same GitHub repo?
Am I going to face issues with such a setup?
Are the common Python tools (documentation generators, pypi packaging, etc) compatible with with such a setup?
Is there a good reason to prefer one setup over the other?
Please keep in mind that this is not an opinion-based question. I want to know if there are any technical issues or problems with any of the two approaches.
Also, I am aware (and please correct me if I am wrong) that setuptools now allow to install dependencies from GitHub repos, even if the GitHub URL of the setup.py is not at the root of the repository.
One aspect is covered here
https://pip.readthedocs.io/en/stable/reference/pip_install/#vcs-support
In particular, if setup.py is not in the root directory you have to specify the subdirectory where to find setup.py in the pip install command.
So if your repository layout is:
pkg_dir/
setup.py # setup.py for package pkg
some_module.py
other_dir/
some_file
some_other_file
You’ll need to use pip install -e vcs+protocol://repo_url/#egg=pkg&subdirectory=pkg_dir.
"Best" approach? That's a matter of opinion, which is not the domain of SO. But here are a couple of justifications for creating separate packages:
Package is functionally independent of the other packages in your project.
That is, doesn't import from them and performs a function that could be useful to other developers. Extra points if the function this package performs is similar to packages already in PyPI.
Extra points if the package has a stable API and clear documentation. Penalty points if package is a thin grab bag of unrelated functions that you factored out of multiple packages for ease of maintenance, but the functions don't have an unifying principle.
The package is optional with respect to your main project, so there'd be cases where users could reasonably choose to skip installing it.
Perhaps one package is a "client" and the other is the "server". Or perhaps the package provides OS-specific capabilities.
Note that a package like this is not functionally independent of the main project and so does not qualify under the previous bullet point, but this would still be a good reason to separate it.
I agree with #boriska's point that the "single package" project structure is a maintenance convenience well worth striving for. But not (and this is just my opinion, I'm going to get downvoted for expressing it) at the expense of cluttering up the public package index with a large number of small packages that are never installed separately.
I am researching the same issue myself. PyPa documentation recommends the layout described in 'native' subdirectory of: https://github.com/pypa/sample-namespace-packages
I find the single package structure described below, very useful, see the discussion around testing the 'installed' version.
https://blog.ionelmc.ro/2014/05/25/python-packaging/#the-structure
I think this can be extended to multiple packages. Will post as I learn more.
The major problem I've with faced when splitting two interdependent packages into two repos came from CI and testing. Specifically branch protections.
Say you have package A and package B and you make some (breaking) changes in both. The automated tests for package A fail because they use the main branch of B (which is no longer compatible with the new version of A) so you can't merge B. And the same problem the other way around.
tldr:
After breaking changees automated tests on merge will fail because they use the main branch of the other repo. Making it impossible to merge.
Our software is modular and I have about 20 git repos in one project.
If a test fails, it is sometimes hard to find the matching commit since several developers work on these 20 repos.
I know the test worked yesterday and fails reproachable today.
Sometimes I use git-bisec, but this works only for one git repo.
Often changes in two git repos make a test fail.
I could write a dirty script which loops over my N git repos myself, but before doing so, I would like to know how experts would solve this.
I use Python, Django and pytest, but AFAIK this does not matter for this question.
I personally prefer to use repo tool to manage complex projects. Put those 20 repos in manifest.xml and each time when build starts create patch manifest if build fails do repo diff manifests to see what was changed and where.
There is category of QA tool for "reverse dependency" CI builds. So your higher level projects get rebuilt every time a lower level change is made. At scale it can be resource intensive.
The entire class of problem is removed if you stop dealing with repo-to-repo relationships and start following version release methodology for the subcomponents. Then you can track the versions of lower-level dependencies and know when you go to upgrade that it broke. Your CI could build against several versions of dependencies if you wanted to systematize it.
Git submodules accomplish that tracking for individual commits, so you again get to decide when to incorporate changes from lower levels. (Notably, that can also be used like released versions if you only ever update to tagged release commits.)
I've been using zc.buildout more and more and I'm encountering problems with some recipes that I have solutions to.
These packages generally fall into several categories:
Package with no obvious links to a project site
Package with links to free hosted service like github or google code
Setup #2 is better then #1, but not much better because for both of these situations, I would have to wait for the developer to apply these changes before i can use the updated package buildout.
What I've been doing up to this point is basically forking the package, giving it a different name and uploading it to pypi, but this is creating redundancy and I think only aggravating the problem.
One possible solution, is to use to use a personal server package index where I would upload updated versions of the code until the developer updates he/her package. This is doable, but it adds additional work, that I would prefer to avoid.
Is there a better way to do this?
Thank you
Your "upload my personalized fork" solution sounds like a terrible idea. You should try http://pypi.python.org/pypi/collective.recipe.patch which lets you automatically patch eggs. Try setting up a local PyPi-compatible index. I think you can also point find-links = at a directory (not just a http:// url) containing your personal versions of those "almost good enough" packages. You can also try monkey patching the defective package, or take advantage of the Zope component model to override the necessary bits in a new package. Often the real authors are listed somewhere in the source code of a package, even if they decided not to put their names up on PyPi.
I've been trying to cut down on the number of custom versions of packages I use. Usually I work with customized packages as develop eggs by linking src/some.project to my checkout of that project's code. I don't have to build a new egg or reinstall every time I edit those packages.
A lot of Python packages used in buildouts are hosted in Plone's svn collective. It's relatively easy to get commit access to that repository.