Python Packaging with .pth files - python

I have a suite of packages that are developed together and bundled into one distribution package.
For sake of argument, let's assume I have Good Reasons for organizing my python distribution package in the following way:
SpanishInqProject/
|---SpanishInq/
| |- weapons/
| | |- __init__.py
| | |- fear.py
| | |- surprise.py
| |- expectations/
| | |- __init__.py
| | |- noone.py
| |- characters/
| |- __init__.py
| |- biggles.py
| |- cardinal.py
|- tests/
|- setup.py
|- spanish_inq.pth
I've added the path configuration file spanish_inq.pth to add SpanishInq to the sys.path, so I can import weapons, .etc directly.
I want to be able to use setuptools to build wheels and have pip install weapons, expectations and characters inside the SpanishInq directory, but without making SpanishInq a package or namespace.
My setup.py:
from setuptools import setup, find_packages
setup(
name='spanish_inq',
packages=find_packages(),
include_package_data=True,
)
With a MANIFEST.in file containing:
spanish_inq.pth
This has been challenging in a couple of ways:
pip install has put weapons etc. directly in the site-packages directory, rather than in a SpanishInq dir.
my spanish_inq.pth file ends up in the sys.exec_prefix dir, rather than in my site-packages dir, meaning the relative path in it is now useless.
The first problem I was able to sort of solve by turning SpanishInq into a module (which I'm not happy about), but I still want to be able to import weapons etc. without SpanishInq as a namespace, and to do this I need SpanishInq added to the sys.path, which is where I was hoping the .pth file would help...but I can't get it to go where it ought to.
So...
How do I get the .pth file to install into the site-packages dir?

This is very similar to setup.py: installing just a pth file? (this question is strictly a superset, in terms of functionality) -- I've adapted the relevant part of my answer there below.
The right thing to do here is to extend setuptools' build_py, and copy the pth file into the directory into the build directory, in the location where setuptools prepares all the files that go into site-packages there.
from setuptools.commands import build_py
class build_py_with_pth_file(build_py):
"""Include the .pth file for this project, in the generated wheel."""
def run(self):
super().run()
destination_in_wheel = "spanish_inq.pth"
location_in_source_tree = "spanish_inq.pth"
outfile = os.path.join(self.build_lib, destination_in_wheel)
self.copy_file(location_in_source_tree, outfile, preserve_mode=0)
setup(
...,
cmdclass={"build_py": build_py_with_pth_file},
)

Related

setuptools - bundle package from framework from relative path

With a project setup as follows:
---------------------
root
FrameworkPackage1
__init__.py
sourcefile1.py
FrameworkPackage2
__init__.py
sourcefile2.py
apps
Project
src
MyApp
__init__.py
__main__.py
setup.py
README.md
---------------------
When I'm creating the setup.py, from what I understand, I use package_dir to set the location of these packages.
---------------------
packages=['MyApp', 'FrameworkPackage1', 'FrameworkPackage2'],
package_dir={'': 'src',
'FrameworkPackage1': '../../FrameworkPackage1',
'FrameworkPackage2': '../../FrameworkPackage2'}
---------------------
So this correctly builds a package with all the required files. However, when I try to install, it fails, and if I just try to untar/gz the package file it puts FrameworkPackage1/2 in the "../../.." dir from where the unzip happens.
Ideally I'd like the package to work as follows and install from pip so I could run the following:
import MyApp as ma
import FrameworkPackage1 as fp1
import FrameworkPackage2 as fp2
print(ma.Function())
print(fp1.OtherFunction())
print(fp2.OtherFunction())
Is there a way to set the frameworks to be found in "../../../" but install into the root of the distribution?
Firstly, as #a_guest suggested, shouldn't the package_dir look like this?
packages=['MyApp', 'FrameworkPackage1', 'FrameworkPackage2'],
package_dir={'': 'src',
'FrameworkPackage1': '../../FrameworkPackage1',
'FrameworkPackage2': '../../FrameworkPackage2'}
Alternatively, you could try adding a __init__.py to the root folder so that it is recognized as a python folder (based on this question)
Secondly, instead of using this bundled structure for your package, you could either:
If the Framework packages are used elsewhere: Treat each package separately. This would allow you to evolve them separately, and add them to your MyApp by simply including them in the requirements.txt (or equivalents). A con of this is that each one would have its own setup.py, but this is a much better packaging practice.
If the Framework packages are not used elsewhere (or you just want your local copy): Switch to a project setup with the setup.py directly in the main folder ( package_dir={'': 'src', 'FrameworkPackage1': 'src', 'FrameworkPackage2': 'src'}, with a structure looking like:
---------------------
...
Project
src
MyApp
__init__.py
__main__.py
FrameworkPackage1
__init__.py
sourcefile1.py
FrameworkPackage2
__init__.py
sourcefile2.py
setup.py
README.md
---------------------

Configure pylint for modules within eggs. (VS code)

Project structure
I have the following folder structure
|
|- src
| |- mypackage
| | |- __init__.py
| | |- mymodule.py
| |- utils.egg
|- main.py
in mymodule.py file I can import the egg adding it to the sys.path as
import sys
sys.path.append('src/utils.egg')
import utils
When calling main.py everything works fine (python -m main).
Problem
The problem comes from pylint. First, it shows the following message in mymodule.py file
Unable to import 'utils' pylint(import-error)
if I ask for suggestions (CRTL + Space) when importing I got
utils.build
.dist
.utils
.setup
# |- suggestions
And from utils.utils I can acces the actual classes / functions in utils module. Of course if I import utils.utils, when executing the main script, an importing error pops up.
How can I configure my vscode setting in order fix pylint?
should I install the egg instead of copy it to the working folder?
Is my project's folder-structure ok, or it goes against recommended practices?
Extra info
In case you wonder the EGG-INFO/SOURCE.txt file looks like
setup.py
utils/__init__.py
utils/functions.py
utils.egg-info/PKG-INFO
utils.egg-info/SOURCES.txt
utils.egg-info/dependency_links.txt
utils.egg-info/top_level.txt
utils/internals/__init__.py
utils/internals/somemodule.py
utils/internals/someothermodule.py
Also, there aren't build nor dist folder in the egg.
This is an issue with Pylint itself and not the Python extension, so it will come down to however you need to configure Pylint.
As for whether you should copy an egg around or install it, you should be installing it into your virtual environment, or at least copying over the appropriate .pth file to make the egg directory work appropriately.

pip installing data files to the wrong place

The source for the package is here
I'm installing the package from the index via:
easy_install hackertray
pip install hackertray
easy_install installs images/hacker-tray.png to the following folder:
/usr/local/lib/python2.7/dist-packages/hackertray-1.8-py2.7.egg/images/
While, pip installs it to:
/usr/local/images/
My setup.py is as follows:
from setuptools import setup
setup(name='hackertray',
version='1.8',
description='Hacker News app that sits in your System Tray',
packages=['hackertray'],
data_files=[('images', ['images/hacker-tray.png'])])
My MANIFEST file is:
include images/hacker-tray.png
Don't use data_files with relative paths. Actually, don't use data_files at all, unless you make sure the target paths are absolute ones properly generated in a cross-platform way insted of hard coded values.
Use package_data instead:
setup(
# (...)
package_data={
"hackertray.data": [
"hacker-tray.png",
],
},
)
where hackertray.data is a proper python package (i.e. is a directory that contains a file named __init__.py) and hacker-tray.png is right next to __init__.py.
Here's how it should look:
.
|-- hackertray
| |-- __init__.py
| `-- data
| |-- __init__.py
| `-- hacker-tray.png
`-- setup.py
You can get the full path to the image file using:
from pkg_resources import resource_filename
print os.path.abspath(resource_filename('hackertray.data', 'hacker-tray.png'))
I hope that helps.
PS: Python<2.7 seems to have a bug regarding packaging of the files listed in package_data. Always make sure to have a manifest file if you're using something older than Python 2.7 for packaging. See here for more info: https://groups.google.com/d/msg/python-virtualenv/v5KJ78LP9Mo/OiBqMcYVFYAJ

Handle file imports after package installation [duplicate]

I use setuptools to distribute my python package. Now I need to distribute additional datafiles.
From what I've gathered fromt the setuptools documentation, I need to have my data files inside the package directory. However, I would rather have my datafiles inside a subdirectory in the root directory.
What I would like to avoid:
/ #root
|- src/
| |- mypackage/
| | |- data/
| | | |- resource1
| | | |- [...]
| | |- __init__.py
| | |- [...]
|- setup.py
What I would like to have instead:
/ #root
|- data/
| |- resource1
| |- [...]
|- src/
| |- mypackage/
| | |- __init__.py
| | |- [...]
|- setup.py
I just don't feel comfortable with having so many subdirectories, if it's not essential. I fail to find a reason, why I /have/ to put the files inside the package directory. It is also cumbersome to work with so many nested subdirectories IMHO. Or is there any good reason that would justify this restriction?
Option 1: Install as package data
The main advantage of placing data files inside the root of your Python package
is that it lets you avoid worrying about where the files will live on a user's
system, which may be Windows, Mac, Linux, some mobile platform, or inside an Egg. You can
always find the directory data relative to your Python package root, no matter where or how it is installed.
For example, if I have a project layout like so:
project/
foo/
__init__.py
data/
resource1/
foo.txt
You can add a function to __init__.py to locate an absolute path to a data
file:
import os
_ROOT = os.path.abspath(os.path.dirname(__file__))
def get_data(path):
return os.path.join(_ROOT, 'data', path)
print get_data('resource1/foo.txt')
Outputs:
/Users/pat/project/foo/data/resource1/foo.txt
After the project is installed as an Egg the path to data will change, but the code doesn't need to change:
/Users/pat/virtenv/foo/lib/python2.6/site-packages/foo-0.0.0-py2.6.egg/foo/data/resource1/foo.txt
Option 2: Install to fixed location
The alternative would be to place your data outside the Python package and then
either:
Have the location of data passed in via a configuration file,
command line arguments or
Embed the location into your Python code.
This is far less desirable if you plan to distribute your project. If you really want to do this, you can install your data wherever you like on the target system by specifying the destination for each group of files by passing in a list of tuples:
from setuptools import setup
setup(
...
data_files=[
('/var/data1', ['data/foo.txt']),
('/var/data2', ['data/bar.txt'])
]
)
Updated: Example of a shell function to recursively grep Python files:
atlas% function grep_py { find . -name '*.py' -exec grep -Hn $* {} \; }
atlas% grep_py ": \["
./setup.py:9: package_data={'foo': ['data/resource1/foo.txt']}
I Think I found a good compromise which will allow you to mantain the following structure:
/ #root
|- data/
| |- resource1
| |- [...]
|- src/
| |- mypackage/
| | |- __init__.py
| | |- [...]
|- setup.py
You should install data as package_data, to avoid the problems described in samplebias answer, but in order to mantain the file structure you should add to your setup.py:
try:
os.symlink('../../data', 'src/mypackage/data')
setup(
...
package_data = {'mypackage': ['data/*']}
...
)
finally:
os.unlink('src/mypackage/data')
This way we create the appropriate structure "just in time", and mantain our source tree organized.
To access such data files within your code, you 'simply' use:
data = resource_filename(Requirement.parse("main_package"), 'mypackage/data')
I still don't like having to specify 'mypackage' in the code, as the data could have nothing to do necessarally with this module, but i guess its a good compromise.
I could use importlib_resources or importlib.resources (depending on python version).
https://importlib-resources.readthedocs.io/en/latest/using.html
I think that you can basically give anything as an argument *data_files* to setup().

setuptools: package data folder location

I use setuptools to distribute my python package. Now I need to distribute additional datafiles.
From what I've gathered fromt the setuptools documentation, I need to have my data files inside the package directory. However, I would rather have my datafiles inside a subdirectory in the root directory.
What I would like to avoid:
/ #root
|- src/
| |- mypackage/
| | |- data/
| | | |- resource1
| | | |- [...]
| | |- __init__.py
| | |- [...]
|- setup.py
What I would like to have instead:
/ #root
|- data/
| |- resource1
| |- [...]
|- src/
| |- mypackage/
| | |- __init__.py
| | |- [...]
|- setup.py
I just don't feel comfortable with having so many subdirectories, if it's not essential. I fail to find a reason, why I /have/ to put the files inside the package directory. It is also cumbersome to work with so many nested subdirectories IMHO. Or is there any good reason that would justify this restriction?
Option 1: Install as package data
The main advantage of placing data files inside the root of your Python package
is that it lets you avoid worrying about where the files will live on a user's
system, which may be Windows, Mac, Linux, some mobile platform, or inside an Egg. You can
always find the directory data relative to your Python package root, no matter where or how it is installed.
For example, if I have a project layout like so:
project/
foo/
__init__.py
data/
resource1/
foo.txt
You can add a function to __init__.py to locate an absolute path to a data
file:
import os
_ROOT = os.path.abspath(os.path.dirname(__file__))
def get_data(path):
return os.path.join(_ROOT, 'data', path)
print get_data('resource1/foo.txt')
Outputs:
/Users/pat/project/foo/data/resource1/foo.txt
After the project is installed as an Egg the path to data will change, but the code doesn't need to change:
/Users/pat/virtenv/foo/lib/python2.6/site-packages/foo-0.0.0-py2.6.egg/foo/data/resource1/foo.txt
Option 2: Install to fixed location
The alternative would be to place your data outside the Python package and then
either:
Have the location of data passed in via a configuration file,
command line arguments or
Embed the location into your Python code.
This is far less desirable if you plan to distribute your project. If you really want to do this, you can install your data wherever you like on the target system by specifying the destination for each group of files by passing in a list of tuples:
from setuptools import setup
setup(
...
data_files=[
('/var/data1', ['data/foo.txt']),
('/var/data2', ['data/bar.txt'])
]
)
Updated: Example of a shell function to recursively grep Python files:
atlas% function grep_py { find . -name '*.py' -exec grep -Hn $* {} \; }
atlas% grep_py ": \["
./setup.py:9: package_data={'foo': ['data/resource1/foo.txt']}
I Think I found a good compromise which will allow you to mantain the following structure:
/ #root
|- data/
| |- resource1
| |- [...]
|- src/
| |- mypackage/
| | |- __init__.py
| | |- [...]
|- setup.py
You should install data as package_data, to avoid the problems described in samplebias answer, but in order to mantain the file structure you should add to your setup.py:
try:
os.symlink('../../data', 'src/mypackage/data')
setup(
...
package_data = {'mypackage': ['data/*']}
...
)
finally:
os.unlink('src/mypackage/data')
This way we create the appropriate structure "just in time", and mantain our source tree organized.
To access such data files within your code, you 'simply' use:
data = resource_filename(Requirement.parse("main_package"), 'mypackage/data')
I still don't like having to specify 'mypackage' in the code, as the data could have nothing to do necessarally with this module, but i guess its a good compromise.
I could use importlib_resources or importlib.resources (depending on python version).
https://importlib-resources.readthedocs.io/en/latest/using.html
I think that you can basically give anything as an argument *data_files* to setup().

Categories