Where to place example data in python package? - python

I'm setting up my first Python package, and I want to install it with some example data so that users can run the code straight off. In case it's relevant my package is on github and I'm using pip.
At the moment my example data is being installed with the rest of the package into site_packages/, by setting include_package_data=True in setup.py, and referencing the files I want to include in MANIFEST.in. However, while this makes sense to me for files used by the code as part of its processing, it doesn't seem especially appropriate for example data.
What is best/standard practice for deploying example data with a python package?

You can put your example data in the repository in examples folder next to your project sources and exclude it from package with prune examples in your manifest file.
There is actually no universal and standard advice for that. Do whatever suits your needs.

Related

Python distibution package based on local git commit

I am trying to create a python distribution package following https://packaging.python.org/en/latest/tutorials/packaging-projects/. My source folder contains many irrelevant files and subfolders which should be excluded from the distribution, such as temporary files, auxiliary code, test output files, private notes, etc. The knowledge of which files are relevant and which not, is already represented in the git source control. Instead of replicating all this as inclusion/exclusion rules in the pyproject configuration file (and needing to keep it up to date going forward), I would like my build chain to be based on a git commit/tag. This will also be useful for keeping versioning in sync between my repo history and pypi. I know there is an option to do it with github actions, but my question is how to do it locally, based just on git rather than github.
Edit following comment: I agree that you don't always want the repo and distro trees to be the same, but it would be much simpler to control if the distro starts from the repo tree as a baseline, with a few additional exclusion rules on top of that.
To automatically include files from a Git or Mercurial repository you can use setuptools_scm. The tool can also automatically set the software version from a repository tag and the amount of changes since the tag.
The tool prepares data for the standard setuptools.

Extract dependencies and versions from a gradle file with Python

I need to extract the dependencies and versions of build.gradle file. I do not have access to the project folder, only to the file, so this answer does not work in my case. I am using python to do the parsing, but it has not worked for me, especially since it does not have a structure already defined for example JSON.
I'm using these files to test my parsing:
Twidere gradle
votling grade
Thanks in advance
Unfortunately, you can't do what you want.
As you can see from the answer given to the SO post you linked, a gradle build file is a script. That script is written in either Kotlin or Groovy, and you can programmatically define the version and dependencies in a multitude of ways. For instance, to set a version, you can hard code it in the script, reference a system property, or get it through an included plugin and more. In your first example, it is set through an extension property, and in the second it is not even defined - likely leaving it up to the individual sub-projects if they even use it. In both examples, the build files are just a small part of a larger multi-project, and each individual project potentially has their own defined dependencies and version.
So there is really no way to tell without actually evaluating the script. And you can't do that unless you have access to the full project structure.

Load text file in python module after installation using pip/other installer

My goal is to make a program I've written easily accessible to potential employers/etc. in order to... showcase my skills.. or whatever. I am not a computer scientist, and I've never written a python module meant for installation before, so I'm new to this aspect.
I've written a machine learning algorithm, and fit parameters to data that I have locally. I would like to distribute the algorithm with "default" parameters, so that the downloader can use it "out of the box" for classification without having a training set. I've written methods which save the parameters to/load the parameters from text files, which I've confirmed work on my platform. I could simply ask users to download the files I've mentioned seperately and use the loadParameters method I've created to manually load the parameters, but I would like to make the installation process as easy as possible for people who may be evaluating me.
What I'm not sure is how to package the text files in such a way that they can automatically be loaded in the __init__ method of the object I have.
I have put the algorithm and files on github here, and written a setup.py script so that it can be downloaded from github using pip like this:
pip install --upgrade https://github.com/NathanWycoff/SySE/tarball/master
However, this doesn't seem to install the text files containing the data I need, only the __init__.py python file containing my code.
So I guess the question boils down to: How do I force pip to download additional files aside from just the module in __init__.py? Or, is there a better way to load default parameters?
Yes, there is a better way, how you can distribute data files with python package.
First of all, read something about proper python package structure. For instance, it's not recommended to put a code into __init__ files. They're just marking that a directory is a python package, plus you can do some import statements there. So, it's better, if you put your SySE class to (for instance) file syse.py in that directory and in __init__.py you can from .syse import SySE.
To the data files. By default, setuptools will distribute only *.py and several other special files (README, LICENCE and so on). However, you can tell to setuptools that you want distribute some other files with the package. Use setup's kwarg package_data, more about that here. Also don't forget to include all you data file into MANIFEST.in, more on that here.
If you do all the above correctly, than you can use package pkg_resources to discover your data files on runtime. pkg_resources handles all possible situations - your package can be distributed in several ways, it can be installed from pip server, it can be installed from wheel, as egg,...more on that here.
Lastly, if you package is public, I can only recommend to upload it on pypi (in case it is not public, you can run your own pip server). Register there and upload your package. You could than do only pip install syse to install it from anywhere. It's quite likely the best way, how to distribute your package.
It's quite a lot work and reading but I'm pretty sure you will benefit from it.
Hope this help.

What to include in PyPi package?

I'm packaging my new python library for PyPi. The repository contains:
Sphinx documentation sources
Supplemental JavaScript library
Examples
Is it a good idea to include such things into a python egg?
What's the convention?
You can see the guts of the library at https://github.com/peterhudec/authomatic
You shall not make everything into the python egg, but anyway, that's up to the python setup.py bdist_egg to choose what to include or not. But in the source package you upload to pypi, yes, include everything that can't be generated by setup.py. You can upload separately the documentation, so it can get published as well.
But generally, what you need to get included in the egg, is what is necessary for the egg to run as-is. Everything else can be included, but can be distributed through other ways, that's up to you.
There are packages on PyPI that are entirey (or almost) entirely written in bash (virtualenvwrapper.sh is one).
If there is a supplemental JavaScript library that you can package, that wouldn't be a bad thing. This prevents the case where the user might not have npm installed, so it makes your library easier to use and your users happier.
Documentation doesn't NEED to be included but if you want to, then by all means do it. Libraries both include and don't include documentation. github3.py now includes it while requests does not. It's up to your preference.
I personally always have the examples in the documentation, so they're included in my packages that include the documentation. I can't think of any packages off the top of my head that include a separate package of examples, but if you feel it's necessary, then go ahead. I might, however, make that a sub-directory of the library itself though. It will make the name-spacing better when it is installed.
But basically, there are no set conventions beyond having the code to perform the task you say the package will perform.
What I can tell for PyQT4:
it includes doc, examples, plugins, ...
I do not know about your JavaScript library but I think it is no problem to include that as well.
This is an example - I do not know the convention. I would put in everything that could be important to the user of your library.

How to package example scripts using distribute?

I use distribute to package a small python library. I made a directory structure as described in the Hitchhiker's Guide to Packaging.
My question: Where (in the directory structure) do I place example scripts that show how to use the library and what changes are necessary to the setup.py?
I think its good, not to install the examples,
rather you can keep your examples folder with your distribution, so it may be on the same level where your setup.py,
If you want to include them, then include as separate module of package, like 'example' - and that directory holds the all example scripts, that users can refer even after installing.
package_data = {
'module_1': [files],
'module_2': [files],
'example': [files],
}
Example scripts are a type of documentation, so install them in the same way you would install other documentation: as package_data.

Categories