maintaining Package vs Sub module

maintaining Package vs Sub module - python

I am working on a big project which includes a number of different repositories.
After some discussions we decided that we might need to create packages from a number of repos.
The problem is that we have two different repositories using the same submodule.
Repository named Control-1 and repository named Control-2 uses a submodule named HW-control.
HW-control is a core repository used by theses two other repos and the dependency goes two directions.
When testing or developing new features of Control-1 or Control-2 we are using HW-control functionalities,
but at the same time when we are developing or testing features of HW-control we need to use control-1 or control-2 in order to run the tests.
The dilemma is if we make the HW-control a package, on every new feature we develop, we need to release a new package version, update it on our environments and then run the tests with Control-1 or Control-2.
Or, edit the package files directly and from there I can see the mess it will create.
And on the other hand, it will be much easier to maintain releases and use it as a packge rather than a submodule.
So the question is, should we make HW-control a package? or should we keep it as a submodule?
what is the correct way or the better option of doing it right?
Thank you all in advanced! Will appreciate any help!

Related

Combine multiple git project forks of python algorithms into one

I am currently working on a project paper, where for the practical part I have one core algorithm and multiple side algorithms that kind of supplement the working of the core algorithm. They however never have to run at the same time, as one calculated stuff with those side algorithms and the core algorithm uses this precumpted data afterwards. For each of them there is a git repo I forked from and did some modifications. Also, and that is really important, all of them use some shared python libraries I wrote. At the moment, I just copy them into each repo by hand, which means that if I find a bug or do some changes, I have to manually copy and paste them into the other projects.
So my project structure at the moment is as follows
Repo1/
folder/
shared.py
Repo2/
folder/
shared.py
Repo3/
folder/
shared.py
Now my questions:
1. Is there a better way to share those python libraries between the projects.
2. What would be the proper git way to merge those forks into one, new repository? I thought about submodules, but does this really make sense if I'm not interested in new commits released in those repos because I already diverged too far from them?

Submodule (that you considered) remains the practical solution for your shared library: that means "folder" becomes a root folder for a nested repository with your shared library in it.
Once you fix something in that shared library on one of those projects, you can commit and push, then, on each other projects:
git submodule update --remote
They would benefit from the fix immediatly.

Python: Multiple packages in one repository or one package per repository?

I have a big Python 3.7+ project and I am currently in the process of splitting it into multiple packages that can be installed separately. My initial thought was to have a single Git repository with multiple packages, each with its own setup.py. However, while doing some research on Google, I found people suggesting one repository per package: (e.g., Python - setuptools - working on two dependent packages (in a single repo?)). However, nobody provides a good explanation as to why they prefer such structure.
So, my question are the following:
What are the implications of having multiple packages (each with its own setup.py) on the same GitHub repo?
Am I going to face issues with such a setup?
Are the common Python tools (documentation generators, pypi packaging, etc) compatible with with such a setup?
Is there a good reason to prefer one setup over the other?
Please keep in mind that this is not an opinion-based question. I want to know if there are any technical issues or problems with any of the two approaches.
Also, I am aware (and please correct me if I am wrong) that setuptools now allow to install dependencies from GitHub repos, even if the GitHub URL of the setup.py is not at the root of the repository.

One aspect is covered here
https://pip.readthedocs.io/en/stable/reference/pip_install/#vcs-support
In particular, if setup.py is not in the root directory you have to specify the subdirectory where to find setup.py in the pip install command.
So if your repository layout is:
pkg_dir/
setup.py # setup.py for package pkg
some_module.py
other_dir/
some_file
some_other_file
You’ll need to use pip install -e vcs+protocol://repo_url/#egg=pkg&subdirectory=pkg_dir.

"Best" approach? That's a matter of opinion, which is not the domain of SO. But here are a couple of justifications for creating separate packages:
Package is functionally independent of the other packages in your project.
That is, doesn't import from them and performs a function that could be useful to other developers. Extra points if the function this package performs is similar to packages already in PyPI.
Extra points if the package has a stable API and clear documentation. Penalty points if package is a thin grab bag of unrelated functions that you factored out of multiple packages for ease of maintenance, but the functions don't have an unifying principle.
The package is optional with respect to your main project, so there'd be cases where users could reasonably choose to skip installing it.
Perhaps one package is a "client" and the other is the "server". Or perhaps the package provides OS-specific capabilities.
Note that a package like this is not functionally independent of the main project and so does not qualify under the previous bullet point, but this would still be a good reason to separate it.
I agree with #boriska's point that the "single package" project structure is a maintenance convenience well worth striving for. But not (and this is just my opinion, I'm going to get downvoted for expressing it) at the expense of cluttering up the public package index with a large number of small packages that are never installed separately.

I am researching the same issue myself. PyPa documentation recommends the layout described in 'native' subdirectory of: https://github.com/pypa/sample-namespace-packages
I find the single package structure described below, very useful, see the discussion around testing the 'installed' version.
https://blog.ionelmc.ro/2014/05/25/python-packaging/#the-structure
I think this can be extended to multiple packages. Will post as I learn more.

The major problem I've with faced when splitting two interdependent packages into two repos came from CI and testing. Specifically branch protections.
Say you have package A and package B and you make some (breaking) changes in both. The automated tests for package A fail because they use the main branch of B (which is no longer compatible with the new version of A) so you can't merge B. And the same problem the other way around.
tldr:
After breaking changees automated tests on merge will fail because they use the main branch of the other repo. Making it impossible to merge.

What is the proper way to work with shared modules in Python development?

I'm working toward adopting Python as part of my team's development tool suite. With the other languages/tools we use, we develop many reusable functions and classes that are specific to the work we do. This standardizes the way we do things and saves a lot of wheel re-inventing.
I can't seem to find any examples of how this is usually handled with Python. Right now I have a development folder on a local drive, with multiple project folders below that, and an additional "common" folder containing packages and modules with re-usable classes and functions. These "common" modules are imported by modules within multiple projects.
Development/
Common/
Package_a/
Package_b/
Project1/
Package1_1/
Package1_2/
Project2/
Package2_1/
Package2_2/
In trying to learn how to distribute a Python application, it seems that there is an assumption that all referenced packages are below the top-level project folder, not collateral to it. The thought also occurred to me that perhaps the correct approach is to develop common/framework modules in a separate project, and once tested, deploy those to each developer's environment by installing to the site-packages folder. However, that also raises questions re distribution.
Can anyone shed light on this, or point me to a resource that discusses this issue?

If you have common code that you want to share across multiple projects, it may be worth thinking about storing this code in a physically separate project, which is then imported as a dependency into your other projects. This is easily achieved if you host your common code project in github or bitbucket, where you can use pip to install it in any other project. This approach not only helps you to easily share common code across multiple projects, but it also helps protect you from inadvertently creating bad dependencies (i.e. those directed from your common code to your non common code).
The link below provides a good introduction to using pip and virtualenv to manage dependencies, definitely worth a read if you and your team are fairly new to working with python as this is a very common toolchain used for just this kind of problem:
http://dabapps.com/blog/introduction-to-pip-and-virtualenv-python/
And the link below shows you how to pull in dependencies from github using pip:
How to use Python Pip install software, to pull packages from Github?

The must-read-first on this kind of stuff is here:
What is the best project structure for a Python application?
in case you haven't seen it (and follow the link in the second answer).
The key is that each major package be importable as if "." was the top level directory, which means that it will also work correctly when installed in a site-packages. What this implies is that major packages should all be flat within the top directory, as in:
myproject-0.1/
myproject/
framework/
packageA/
sub_package_in_A/
module.py
packageB/
...
Then both you (within your other packages) and your users can import as:
import myproject
import packageA.sub_package_in_A.module
etc
Which means you should think hard about #MattAnderson's comment, but if you want it to appear as a separately-distributable package, it needs to be in the top directory.
Note this doesn't stop you (or your users) from doing an:
import packageA.sub_package_in_A as sub_package_in_A
but it does stop you from allowing:
import sub_package_in_A
directly.

...it seems that there is an assumption that all referenced packages
are below the top-level project folder, not collateral to it.
That's mainly because the current working directory is the first entry in sys.path by default, which makes it very convenient to import modules and packages below that directory.
If you remove it, you can't even import stuff from the current working directory...
$ touch foo.py
$ python
>>> import sys
>>> del sys.path[0]
>>> import foo
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named foo
The thought also occurred to me that perhaps the correct approach is
to develop common/framework modules in a separate project, and once
tested, deploy those to each developer's environment by installing to
the site-packages folder.
It's not really a major issue for development. If you're using version control, and all developers check out the source tree in the same structure, you can easily employ relative path hacks to ensure the code works correctly without having to mess around with environment variables or symbolic links.
However, that also raises questions re distribution.
This is where things can get a bit more complicated, but only if you're planning to release libraries independently of the projects which use them, and/or having multiple project installers share the same libraries. It that's the case, take a look at distutils.
If not, you can simply employ the same relative path hacks used in development to ensure you project works "out of the box".

I think that this is the best reference for creating a distributable python package:
link removed as it leads to a hacked site.
also, don't feel that you need to nest everything under a single directory. You can do things like
platform/
core/
coremodule
api/
apimodule
and then do things like from platform.core import coremodule, etc.

Using Mercurial to separate three versions: official/development/testing/

I'm working on deploying a Python module composed of several dozen files and folders; I use Mercurial for managing the software changes.
I want to keep the same module in three branches: the official one (which the team uses), the development one (this may be more than one development branch), and the testing branch (not the testing of the official branch, but a collection of test related to a third party module used by my module - regression testing when the third party module makes new releases).
How can I accomplish this in Mercurial? Simply name three branches in the same folder or cloning one version into three places an maintain them separately?
Any insight on how to manage this in general would be appreciated.
Thank you.

The "official" way would be cloning your repo in as many branch as you need.
But named branches within a repo is also acceptable, especially if you don't need to work simultaneously on different development efforts (each associated to their respective branch)
I find the "Guide to Branching Model in Mercurial" very instructive on this kind of choice.
Other information on Mercurial branches in this SO question as well.

How to contribute improvements to packages hosted on Cheeseshop ( pypi )?

I've been using zc.buildout more and more and I'm encountering problems with some recipes that I have solutions to.
These packages generally fall into several categories:
Package with no obvious links to a project site
Package with links to free hosted service like github or google code
Setup #2 is better then #1, but not much better because for both of these situations, I would have to wait for the developer to apply these changes before i can use the updated package buildout.
What I've been doing up to this point is basically forking the package, giving it a different name and uploading it to pypi, but this is creating redundancy and I think only aggravating the problem.
One possible solution, is to use to use a personal server package index where I would upload updated versions of the code until the developer updates he/her package. This is doable, but it adds additional work, that I would prefer to avoid.
Is there a better way to do this?
Thank you

Your "upload my personalized fork" solution sounds like a terrible idea. You should try http://pypi.python.org/pypi/collective.recipe.patch which lets you automatically patch eggs. Try setting up a local PyPi-compatible index. I think you can also point find-links = at a directory (not just a http:// url) containing your personal versions of those "almost good enough" packages. You can also try monkey patching the defective package, or take advantage of the Zope component model to override the necessary bits in a new package. Often the real authors are listed somewhere in the source code of a package, even if they decided not to put their names up on PyPi.
I've been trying to cut down on the number of custom versions of packages I use. Usually I work with customized packages as develop eggs by linking src/some.project to my checkout of that project's code. I don't have to build a new egg or reinstall every time I edit those packages.
A lot of Python packages used in buildouts are hosted in Plone's svn collective. It's relatively easy to get commit access to that repository.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

maintaining Package vs Sub module - python

Related

Combine multiple git project forks of python algorithms into one

Python: Multiple packages in one repository or one package per repository?

What is the proper way to work with shared modules in Python development?

Using Mercurial to separate three versions: official/development/testing/

How to contribute improvements to packages hosted on Cheeseshop ( pypi )?

Categories

Resources