I have a few Python packages that I would like to tidy up and publish on PyPI. These packages import a couple of Python modules I've written to augment or simplify certain operations (e.g., reading/writing from CSV files with headers by wrapping csv functions), provide handy data structures, etc. Currently these modules are housed in the top-level directory that holds the code for my projects, and I rely on reaching them by having added that directory to my PYTHONPATH environmental variable. (Less than tidy, I know.)
By creating a separate package for these modules and uploading them on PyPI, I could mark such a package as a dependency for the packages I actually want to distribute. These convenience modules are, however, small and of limited use and interest, such that I don't think they warrant distributing as a separate package on PyPI. On the other hand, I am hesitant to copy these convenience modules (i.e., use cp convenience_module.py projectX/.) into each project directory, as this creates multiple copies of the same file both in the VCS repository housing my Python code and in the different source distribution tarballs I would post to PyPI. Is there an elegant solution to this problem?
You don't say why you're hesitant to 'provide copies'. In general, I think a reasonable approach is to think about how you've set things up for yourself to use the convenience modules. Did you install them in site-packages (or equivalent), or did you just depend on them being in the directory you ran the code from? However you use the modules, is that situation ideal, or is there a way that would be nicer for you?
Start with that, and figure out how to automate it through setup.py, which lets you put things wherever you want on the system (though I strongly discourage abusing this capability).
Whether you distribute them as a tarball or with the package that needs them, you still have to maintain all of the files, so the only real question is whether you intend for those convenience modules to develop their own user communities with their own support requests, etc., or whether they're decidedly intended only for use in support of this other module.
If you intend those modules to be used only for the one module, include them in the package, perhaps in a 'utils' package inside the distribution. Otherwise you're just cluttering the index with things people might think are useful, but are really joined at the hip with something else that drives the changes and maintenance of them.
If you intend those modules to be generic, and intend to maintain them as such, and think they have use outside of supporting this module, distribute them separately.
As far as I know, distributing these small packages via PyPI is only viable option. Yes, it clutters the index with near-useless packages, but its something that should be solved by PyPI maintainers, not package developers. Another alternative is to use stdlib's or other util packages data and functions rather than reinventing the wheel.
Just make sure you describe that utils package as such, or extend them in something more useful for others.
Related
I am a little bit confused in the difference between a package and a library. When I install packages from pypi.org, these packages contain several sub-packages, that contain modules. When I googled the difference between a package and I library, I found this.
And that being the case, can a package contain several sub-packages be also called as a library? If no then what is a library? And what is the difference between a library and a package containing sub-packages?
Library
Most often will refer to the general library or another collection created with a similar format and use. The General Library is the sum of 'standard', popular and widely used Modules, witch can be thought of as single file tools, for now or short cuts making things possible or faster. The general library is an option most people enable when installing Python. Because it has this name "Python General Library" it is used often with similar structure, and ideas. Witch is simply to have a bunch of Modules, maybe even packages grouped together, usually in a list. The list is usually to download them. Generally it is just related files, with similar interests. That is the easiest way to describe it.
Module
A Module refers to a file. The file has script 'in it' and the name of the file is the name of the module, Python files end with .py. All the file contains is code that ran together makes something happen, by using functions, strings ect. Main modules you probably see most often are popular because they are special modules that can get info from other files/modules. It is confusing because the name of the file and module are equal and just drop the .py. Really it's just code you can use as a shortcut written by somebody to make something easier or possible.
Package
This is a termis used to generally sometimes, although context makes a difference. The most common use from my experience is multiple modules (or files) that are grouped together. Why they are grouped together can be for a few reasons, that is when context matters. These are ways I have noticed the term package(s) used. They are a group of Downloaded, created and/or stored modules. Which can all be true, or only 1, but really it is just a file that references other files, that need to be in the correct structure or format, and that entire sum is the package itself, installed or may have been included in the python general library. A package can contain modules(.py files) because they depend on each other and sometimes may not work correctly, or at all. There is always a common goal of every part (module/file) of a package, and the total sum of all of the parts is the package itself.
Most often in Python Packages are Modules, because the package name is the name of the module that is used to connect all the pieces. So you can input a package because it is a module, also allows it to call upon other modules, that are not packages because they only perform a certain function, or task don't involve other files. Packages have a goal, and each module works together to achieve that final goal.
Most confusion come from a simple file file name or prefix to a file, used as the module name then again the package name.
Remember Modules and Packages can be installed. Library is usually a generic term for listing, or formatting a group of modules and packages. Much like Pythons general library. A hierarchy would not work, APIs do not belong really, and if you did they could be anywhere and every ware involving Script, Module, and Packages, the worl library being such a general word, easily applied to many things, also makes API able to sit above or below that. Some Modules can be based off of other code, and that is the only time I think it would relate to a pure Python related discussion.
I have a big Python 3.7+ project and I am currently in the process of splitting it into multiple packages that can be installed separately. My initial thought was to have a single Git repository with multiple packages, each with its own setup.py. However, while doing some research on Google, I found people suggesting one repository per package: (e.g., Python - setuptools - working on two dependent packages (in a single repo?)). However, nobody provides a good explanation as to why they prefer such structure.
So, my question are the following:
What are the implications of having multiple packages (each with its own setup.py) on the same GitHub repo?
Am I going to face issues with such a setup?
Are the common Python tools (documentation generators, pypi packaging, etc) compatible with with such a setup?
Is there a good reason to prefer one setup over the other?
Please keep in mind that this is not an opinion-based question. I want to know if there are any technical issues or problems with any of the two approaches.
Also, I am aware (and please correct me if I am wrong) that setuptools now allow to install dependencies from GitHub repos, even if the GitHub URL of the setup.py is not at the root of the repository.
One aspect is covered here
https://pip.readthedocs.io/en/stable/reference/pip_install/#vcs-support
In particular, if setup.py is not in the root directory you have to specify the subdirectory where to find setup.py in the pip install command.
So if your repository layout is:
pkg_dir/
setup.py # setup.py for package pkg
some_module.py
other_dir/
some_file
some_other_file
You’ll need to use pip install -e vcs+protocol://repo_url/#egg=pkg&subdirectory=pkg_dir.
"Best" approach? That's a matter of opinion, which is not the domain of SO. But here are a couple of justifications for creating separate packages:
Package is functionally independent of the other packages in your project.
That is, doesn't import from them and performs a function that could be useful to other developers. Extra points if the function this package performs is similar to packages already in PyPI.
Extra points if the package has a stable API and clear documentation. Penalty points if package is a thin grab bag of unrelated functions that you factored out of multiple packages for ease of maintenance, but the functions don't have an unifying principle.
The package is optional with respect to your main project, so there'd be cases where users could reasonably choose to skip installing it.
Perhaps one package is a "client" and the other is the "server". Or perhaps the package provides OS-specific capabilities.
Note that a package like this is not functionally independent of the main project and so does not qualify under the previous bullet point, but this would still be a good reason to separate it.
I agree with #boriska's point that the "single package" project structure is a maintenance convenience well worth striving for. But not (and this is just my opinion, I'm going to get downvoted for expressing it) at the expense of cluttering up the public package index with a large number of small packages that are never installed separately.
I am researching the same issue myself. PyPa documentation recommends the layout described in 'native' subdirectory of: https://github.com/pypa/sample-namespace-packages
I find the single package structure described below, very useful, see the discussion around testing the 'installed' version.
https://blog.ionelmc.ro/2014/05/25/python-packaging/#the-structure
I think this can be extended to multiple packages. Will post as I learn more.
The major problem I've with faced when splitting two interdependent packages into two repos came from CI and testing. Specifically branch protections.
Say you have package A and package B and you make some (breaking) changes in both. The automated tests for package A fail because they use the main branch of B (which is no longer compatible with the new version of A) so you can't merge B. And the same problem the other way around.
tldr:
After breaking changees automated tests on merge will fail because they use the main branch of the other repo. Making it impossible to merge.
My project has to be extensible, i have a lot of scripts with the same interface that lookup things online. Before i was using __import__ but that does not let me put my 'plugins' on a dedicated directory:
root/
main.py
plugins/
[...]
So my question is: Is there a way to individually import modules from that subdirectory? I'm guessing importlib, but i'm so lost in how the python module loading process works... What i want to do is something like this:
for pluginname in plugins:
plugin = somekindofimport("plugins/{name}".format(name=pluginname))
plugin.unififedinterface()
Also, as a side question, the way am i trying to achieve extensibility is a good way?
I'm on python3.3
Stop thinking in terms of pathnames and start thinking in terms of packages. Read Packages in the tutorial, and if you want more detail see The import system.
But the basic idea is this:
Create a file name plugins/__init__.py. It can be empty; that's enough to turn plugins into a package. Which means you can import modules from that package with:
import plugins.plugin
So, how do you do this dynamically? That's what importlib is for. (You can also use __import__ here, but it's less flexible, and less readable in non-trivial cases, so unless you need pre-3.3 compatibility, don't.)
plugin = importlib.import_module('plugins.{name}'.format(name=pluginname))
It would probably be cleaner to import plugins to get the package, and then use relative imports from within that package, as shown in the examples in the import_module docs.
This also means Python takes care of the .pyc creation and caching, etc.
And it means that you can later expand plugins to be a "namespace package", which can be split across multiple directories like /usr/share/myapp/plugins for stock plugins, /etc/myapp/plugins for site plugins and ~/myapp/plugins for user-specific plugins.
If you really, really want to import from a directory that isn't a package, you can create a module loader and use it, but that's a whole lot of work for no actual benefit. (It's actually not that hard in 3.3 (SourceLoader and friends will do most of the work for you), but you will find almost no examples out there to guide you; instead, you'll find examples of the 2.6-3.2 way, or the 2.0-2.5 way, both of which are hard.) Plus, it means that if someone creates a plugin named, say, gzip, you can end up blocking the stdlib gzip module with the plugin. (That's especially fun if the gzip plugin tries to use the gzip stdlib module, as it likely will…) If the plugin ends up being named plugins.gzip, there's no problem.
Also, as a side question, the way am i trying to achieve extensibility is a good way?
As long as you only want to support 3.3+, yes, I think this is a great solution.
Before 3.3, using a package for plugins was a lot more problematic. People have come up with a variety of different plugin systems—in one case going so far as to dynamically create module objects and execfile into them. If you need to deal with that, I would suggest looking at existing Python apps with plugins (e.g., MusicBrainz Picard) to get different ideas.
I'm working toward adopting Python as part of my team's development tool suite. With the other languages/tools we use, we develop many reusable functions and classes that are specific to the work we do. This standardizes the way we do things and saves a lot of wheel re-inventing.
I can't seem to find any examples of how this is usually handled with Python. Right now I have a development folder on a local drive, with multiple project folders below that, and an additional "common" folder containing packages and modules with re-usable classes and functions. These "common" modules are imported by modules within multiple projects.
Development/
Common/
Package_a/
Package_b/
Project1/
Package1_1/
Package1_2/
Project2/
Package2_1/
Package2_2/
In trying to learn how to distribute a Python application, it seems that there is an assumption that all referenced packages are below the top-level project folder, not collateral to it. The thought also occurred to me that perhaps the correct approach is to develop common/framework modules in a separate project, and once tested, deploy those to each developer's environment by installing to the site-packages folder. However, that also raises questions re distribution.
Can anyone shed light on this, or point me to a resource that discusses this issue?
If you have common code that you want to share across multiple projects, it may be worth thinking about storing this code in a physically separate project, which is then imported as a dependency into your other projects. This is easily achieved if you host your common code project in github or bitbucket, where you can use pip to install it in any other project. This approach not only helps you to easily share common code across multiple projects, but it also helps protect you from inadvertently creating bad dependencies (i.e. those directed from your common code to your non common code).
The link below provides a good introduction to using pip and virtualenv to manage dependencies, definitely worth a read if you and your team are fairly new to working with python as this is a very common toolchain used for just this kind of problem:
http://dabapps.com/blog/introduction-to-pip-and-virtualenv-python/
And the link below shows you how to pull in dependencies from github using pip:
How to use Python Pip install software, to pull packages from Github?
The must-read-first on this kind of stuff is here:
What is the best project structure for a Python application?
in case you haven't seen it (and follow the link in the second answer).
The key is that each major package be importable as if "." was the top level directory, which means that it will also work correctly when installed in a site-packages. What this implies is that major packages should all be flat within the top directory, as in:
myproject-0.1/
myproject/
framework/
packageA/
sub_package_in_A/
module.py
packageB/
...
Then both you (within your other packages) and your users can import as:
import myproject
import packageA.sub_package_in_A.module
etc
Which means you should think hard about #MattAnderson's comment, but if you want it to appear as a separately-distributable package, it needs to be in the top directory.
Note this doesn't stop you (or your users) from doing an:
import packageA.sub_package_in_A as sub_package_in_A
but it does stop you from allowing:
import sub_package_in_A
directly.
...it seems that there is an assumption that all referenced packages
are below the top-level project folder, not collateral to it.
That's mainly because the current working directory is the first entry in sys.path by default, which makes it very convenient to import modules and packages below that directory.
If you remove it, you can't even import stuff from the current working directory...
$ touch foo.py
$ python
>>> import sys
>>> del sys.path[0]
>>> import foo
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named foo
The thought also occurred to me that perhaps the correct approach is
to develop common/framework modules in a separate project, and once
tested, deploy those to each developer's environment by installing to
the site-packages folder.
It's not really a major issue for development. If you're using version control, and all developers check out the source tree in the same structure, you can easily employ relative path hacks to ensure the code works correctly without having to mess around with environment variables or symbolic links.
However, that also raises questions re distribution.
This is where things can get a bit more complicated, but only if you're planning to release libraries independently of the projects which use them, and/or having multiple project installers share the same libraries. It that's the case, take a look at distutils.
If not, you can simply employ the same relative path hacks used in development to ensure you project works "out of the box".
I think that this is the best reference for creating a distributable python package:
link removed as it leads to a hacked site.
also, don't feel that you need to nest everything under a single directory. You can do things like
platform/
core/
coremodule
api/
apimodule
and then do things like from platform.core import coremodule, etc.
We have a growing library of apps depending on a set of common util modules. We'd like to:
share the same utils codebase between all projects
allow utils to be extended (and fixed!) by developers working on any project
have this be reasonably simple to use for devs (i.e. not a big disruption to workflow)
cross-platform (no diffs for devs on Macs/Win/Linux)
We currently do this "manually", with the utils versioned as part of each app. This has its benefits, but is also quite painful to repeatedly fix bugs across a growing number of codebases.
On the plus side, it's very simple to deal with in terms of workflow - util module is part of each app, so on that side there is zero overhead.
We also considered (fleetingly) using filesystem links or some such (not portable between OS's)
I understand the implications about release testing and breakage, etc. These are less of a problem than the mismatched utils are at the moment.
You can take advantage of Python paths (the paths searched when looking for module to import).
Thus you can create different directory for utils and include it within different repository than the project that use these utils. Then include path to this repository in PYTHONPATH.
This way if you write import mymodule, it will eventually find mymodule in the directory containing utils. So, basically, it will work similarly as it works for standard Python modules.
This way you will have one repository for utils (or separate for each util, if you wish), and separate repositories for other projects, regardless of the version control system you use.
What versioning system are you under? If you are under git, take a look to submodules. The idea in this case is that you would be able to keep a unique, separate repository with the utils, that would be polled into the various project automatically.
I have no direct experience with mercurial, but I believe subrepositories are the equivalent feature.
If you are under SVN... wait... I hope not! :)