Paths from .egg file not working (os.path.dirname) - python

I have a Scrapy spider that is using gettext to translate some strings. The localization files is stored in /locale/.
When I load the translation I do it with the following code:
t = gettext.translation('sv', localedir=LOCALE_DIR, languages=['sv'])
LOCALE_DIR is set in settings.py with the following code:
import os
BASE_DIR = os.path.dirname(os.path.dirname(__file__))
LOCALE_DIR = os.path.join(BASE_DIR, 'locale')
This works great when I run the code locally with scrapy crawl spider. But when I deploy it to scrapyd, it generates an .egg file and the localization files can no longer be found. When I print the LOCALE_DIR from the deployed version. It gives me /tmp/condobot-1428391146-4QuH3E.egg/locale.
I guess this is the reason why the files can not be found. The path is a subfolder of a file, which does not make sense. What I expected was that the .egg file would be extracted into a folder, and the path would point to /tmp/condobot-1428391146-4quh3e/locale.
Is there another, better way of setting the path to LOCALE_DIR than the way that I currently do it? I have also tried setting it to locale without any better results.
EDIT: I do use a setup.py file with the following code:
from setuptools import setup, find_packages
setup(
name = 'project',
version = '1.0',
packages = find_packages(),
entry_points = {'scrapy': ['settings = condobot.settings']},
package_data = {
# If any package contains *.txt or *.rst files, include them:
'': ['*.mo', '*.po', '*.txt'],
},
)
I also tried using the following setup.py with a MANIFEST.in file:
from setuptools import setup, find_packages
setup(
name = 'project',
install_requires = ['distribute'],
version = '1.0',
packages = find_packages(),
entry_points = {'scrapy': ['settings = condobot.settings']},
include_package_data = True,
zip_safe = False,
)
MANIFEST.in
recursive-include locale *
recursive-include condobot/locale *
My file structure looks like this:
- condobot
- locale
- sv
- LC_MESSAGES
sv.mo
pipelines.py
settings.py
- locale
- sv
- LC_MESSAGES
sv.mo
MANIFEST.in
requirements.txt
scrapy.cfg
setup.py
(I have placed the /locale/ folder both in / and in /condobot/ just to make sure that the path is not wrong.)
I have extracted the .egg file and I can confirm that it does include the /locale/ folder, and in the /locale/ folder there is /locale/sv/LC_MESSAGES/sv.mo and /locale/sv/LC_MESSAGES/sv.po.
So the issue does not seem to be that the setup.py file is not including the files in the .egg file. It seems to be that the path /......./file.egg/locale/ does not work.

One way to be sure that any other non-source file is accessible after the python module is packaged as an egg is to specify zip_safe as False to have the package fully extracted when installed, i.e.:
setup(
name = 'project',
version = '1.0',
...
zip_safe = False,
)

Related

How to correctly install data files with setup.py?

Using python 3.8 I have the following structure in a test library:
testrepo
setup.py
Manifest.in
util/
mycode.py
data/
mydata.txt
The file setup.py looks like
setup(
name='testrepo',
version="0.1.0",
packages=['util'],
author='Tester',
description='Testing repo',
include_package_data=True,
package_data={
"util.data": ["*"]
},
)
and using the following Manifest.in:
include util/data/mycode.txt
and when I install this package I do not see any hint of the data folder in venv/lib/python3.8/site-packages/util (when installing the repo into a python virtual environment).
How to do it correctly, so I can read the content from the file util/data/mydata.txt using
from util import data
import importlib.resources as import_resources
text = import_resources.read_text(data, "mydata.txt")
or whatever...
Where can I find this completely documented, with examples etc.?
I guess what you have to do is to create the following basic structure of the repo:
myrepo
setup.py
Manifest.in
mypackage/
__init__.py
mycode.py
data/
__init__.py
mydata.txt
Just make sure to keep in mind 6 additional steps:
You need to put the data folder inside your package folder
You need to add __init__.py inside your data folder.
In setup.py you have to use packages=find_packages(), to find your packages.
In setup.py, you have to set include_package_data=True,
In setup.py, you have to specify the path to your data files:
package_data={
"mypackage.data": ["*"]
},
You also have to define a second file names Manifest.in containing again your data files as follows (using a placeholder here - you can also add each file in a separate line):
include util/data/*
It you are lucky, then you can include/use your data file like
from mypackage import data
import importlib.resources as import_resources
text = import_resources.read_text(data, "mydata.txt")
or
with import_resources.path(data, "mydata.txt") as filename:
myfilename = filename
to get the path to the data file.
Not sure this is documented anywhere.

Include package data when installing

I have a problem when trying to install a Python package that I have created.
The package includes a bitmap images which is used within the package (for OCR).
My folder structure is the following:
mypackage
- mypackage
- media
- template.bmp
- module1.py
- module2.py
- etc...
- tests
- MANIFEST.in
- setup.py
template.bmp is used by the module1.py.
The MANIFEST.in file:
include mypackage/media/template.bmp
The setup.py:
setup(
....
packages = find_packages(exclude=["*.tests", "*.tests.*", "tests.*", "tests"]),
include_package_data=True,
package_data={'mypackage': ['media/template.bmp']},
...
)
When I run
python setup.py sdist
I can verify that the media folder is included along with template.bmp in the .egg file. However, when referencing the bitmap in a module using
directory = os.path.dirname(os.path.abspath(__file__))
template_path = directory + '/media/template.bmp'
cv2.imread(template_path, 0)
I get a file not found error. The directory variable is the following:
'C:\\anaconda3\\lib\\site-packages\\mypackage-0.0.1-py3.6.egg\\mypackage'
Am I missing something?
Using pkg_resources solved my problem.
template_path = pkg_resources.resource_filename(__name__, '/media/template.bmp')

Adding json file to MANIFEST.in and installing package results in an error

My package tree looks like this: (with a few more irrelevant files)
- setup.py
- MANIFEST.in
- mydir
|
- file.py
- file.json
setup.py:
from distutils.core import setup
setup(
name = 'mydir',
packages = ['mydir'],
version = '1.2.2',
description = 'desc',
author = 'my name',
author_email = 'my#email.com',
url = 'https://github.com/myname/mydir',
download_url = 'https://github.com/myname/mydir/archive/1.2.2.tar.gz',
keywords = ['key1', 'key2'],
classifiers = [],
)
When the MANIFEST.in file was empty, the json wasn't included in the dist file.
So I've added the json file to the MANIFEST.in so now it contains only:
include mydir/file.json
When I execute the python setup.py sdist command, the auto generated MANIFEST file contains all the necessary files, including file.json.
However, as I try to install my package using pip, I get the following error:
error: can't copy 'file.json': doesn't exist or not a regular file
Got it.
Changed setup.py to use from setuptools import setup, find_packages instead of distutils.core
Also added include_package_data = True, to setup.py:
setup(
...
include_package_data = True,
...
)
together with the include in the MANIFEST.in, the json file was extracted to the target dir as expected.

setup.py not installing data files

I have a Python library that, in addition to regular Python modules, has some data files that need to go in /usr/local/lib/python2.7/dist-package/mylibrary.
Unfortunately, I have been unable to convince setup.py to actually install the data files there. Note that this behaviour is under install - not sdist.
Here is a slightly redacted version of setup.py
module_list = list_of_files
setup(name ='Modules',
version ='1.33.7',
description ='My Sweet Module',
author ='PN',
author_email ='email',
url ='url',
packages = ['my_module'],
# I tried this. It got installed in /usr/my_module. Not ok.
# data_files = [ ("my_module", ["my_module/data1",
# "my_module/data2"])]
# This doesn't install it at all.
package_data = {"my_module" : ["my_module/data1",
"my_module/data2"] }
)
This is in Python 2.7 (will have to run in 2.6 eventually), and will have to run on some Ubuntu between 10.04 and 12+. Developing it right now on 12.04.
UPD:
package_data accepts dict in format {'package': ['list', 'of?', 'globs*']}, so to make it work, one should specify shell globs relative to package dir, not the file paths relative to the distribution root.
data_files has a different meaning, and, in general, one should avoid using this parameter.
With setuptools you only need include_package_data=True, but data files should be under version control system, known to setuptools (by default it recognizes only CVS and SVN, install setuptools-git or setuptools-hg if you use git or hg...)
with setuptools you can:
- in MANIFEST.im:
include my_module/data*
- in setup.py:
setup(
...
include_package_data = True,
...
)
http://docs.python.org/distutils/setupscript.html#installing-additional-files
If directory is a relative path, it is interpreted relative to the
installation prefix (Python’s sys.prefix for pure-Python packages,
sys.exec_prefix for packages that contain extension modules).
This will probably do it:
data_files = [ ("my_module", ["local/lib/python2.7/dist-package/my_module/data1",
"local/lib/python2.7/dist-package/my_module/data2"])]
Or just use join to add the prefix:
data_dir = os.path.join(sys.prefix, "local/lib/python2.7/dist-package/my_module")
data_files = [ ("my_module", [os.path.join(data_dir, "data1"),
os.path.join(data_dir, "data2")])]
The following solution worked fine for me.
You should have MANIFEST.in file where setup.py is located.
Add the following code to the manifest file
recursive-include mypackage *.json *.md # can be extended with more extensions or file names.
Another solution is adding the following code to the MANIFEST.in file.
graft mypackage # will copy the entire package including non-python files.
global-exclude __pyache__ *.txt # list files you dont want to include here.
Now, when you do pip install all the necessary files will be included.
Hope this helps.
UPDATE:
Make sure that you also have include_package_data=True in the setup file

Include non python files in RPM with setuptools

I have some fixture directories that contain xml files that I would like included with my python project when building the RPM with bdist_rpm. I thought I could do this by having MANIFEST.in do a recursive-include * *, however, it does not include anything other than *.py files. Is there anyway to have bdist_rpm include non python files in the package or specifically include *.xml files as well?
Where are you trying to install them? If you put them inside a package directory, like this...
myproject/
mypackage/
__init__.py
resources/
file1.xml
file2.xml
...you can use the package_data option in your setup.py file, like this:
from setuptools import setup, find_packages
setup(
name='myproject',
version='0.1',
description='A description.',
packages=find_packages(),
include_package_data=True,
package_data = { '': [ '*.xml' ] },
install_requires=[],
)
This will recursively include any *.xml files inside of any packages. They'll get installed with the rest of your package(s) somewhere inside of the Python library path. You can do the same thing with a MANIFEST.in that looks like this:
recursive-include * *.xml
If you're trying to install them into specific filesystem locations outside of the Python library, I'm not sure if you can do that via setup.py.
You can use data_files parameter of setup to do what you need. Something like this:
setup(
...
package_data = { '/usr/share/yourapp/xmls': [ 'xmls/1.xml', 'xmls/2.xml' ] },
...
)
This would install following files:
/usr/share/yourapp/xmls/1.xml
/usr/share/yourapp/xmls/2.xml
I generally create the list of files in a function like this:
def get_xmls():
xmlfiles = []
for filename in os.listdir('xmls/'):
if filename.endswith('.xml'):
xmlfiles.append('xmls/%s' % filename)
return xmlfiles

Categories