Finding a file in a Python module distribution [duplicate]

Finding a file in a Python module distribution [duplicate] - python

This question already has answers here:
How to read a (static) file from inside a Python package?
(6 answers)
Closed 2 years ago.
I've written a Python package that includes a bsddb database of pre-computed values for one of the more time-consuming computations. For simplicity, my setup script installs the database file in the same directory as the code which accesses the database (on Unix, something like /usr/lib/python2.5/site-packages/mypackage/).
How do I store the final location of the database file so my code can access it? Right now, I'm using a hack based on the __file__ variable in the module which accesses the database:
dbname = os.path.join(os.path.dirname(__file__), "database.dat")
It works, but it seems... hackish. Is there a better way to do this? I'd like to have the setup script just grab the final installation location from the distutils module and stuff it into a "dbconfig.py" file that gets installed alongside the code that accesses the database.

Try using pkg_resources, which is part of setuptools (and available on all of the pythons I have access to right now):
>>> import pkg_resources
>>> pkg_resources.resource_filename(__name__, "foo.config")
'foo.config'
>>> pkg_resources.resource_filename('tempfile', "foo.config")
'/usr/lib/python2.4/foo.config'
There's more discussion about using pkg_resources to get resources on the eggs page and the pkg_resources page.
Also note, where possible it's probably advisable to use pkg_resources.resource_stream or pkg_resources.resource_string because if the package is part of an egg, resource_filename will copy the file to a temporary directory.

Use pkgutil.get_data. It’s the cousin of pkg_resources.resource_stream, but in the standard library, and should work with flat filesystem installs as well as zipped packages and other importers.

That's probably the way to do it, without resorting to something more advanced like using setuptools to install the files where they belong.
Notice there's a problem with that approach, because on OSes with real a security framework (UNIXes, etc.) the user running your script might not have the rights to access the DB in the system directory where it gets installed.

Use the standard Python-3.7 library's importlib.resources module,
which is more efficient than setuptools:pkg_resources
(on previous Python versions, use the backported importlib_resources library).
Attention: For this to work, the folder where the data-file resides must be a regular python-package. That means you must add an __init__.py file into it, if not already there.
Then you can access it like this:
try:
import importlib.resources as importlib_resources
except ImportError:
# In PY<3.7 fall-back to backported `importlib_resources`.
import importlib_resources
## Note that the actual package could have been used,
# not just its (string) name, with something like:
# from XXX import YYY as data_pkg
data_pkg = '.'
fname = 'database.dat'
db_bytes = importlib_resources.read_binary(data_pkg, fname)
# or if a file-like stream is needed:
with importlib_resources.open_binary(data_pkg, fname) as db_file:
...

Related

What is the cleanest way to add a directory of third-party packages to the beginning of the Python path?

My context is appengine_config.py, but this is really a general Python question.
Given that we've cloned a repo of an app that has an empty directory lib in it, and that we populate lib with packages by using the command pip install -r requirements.txt --target lib, then:
dirname ='lib'
dirpath = os.path.join(os.path.dirname(__file__), dirname)
For importing purposes, we can add such a filesystem path to the beginning of the Python path in the following way (we use index 1 because the first position should remain '.', the current directory):
sys.path.insert(1, dirpath)
However, that won't work if any of the packages in that directory are namespace packages.
To support namespace packages we can instead use:
site.addsitedir(dirpath)
But that appends the new directory to the end of the path, which we don't want in case we need to override a platform-supplied package (such as WebOb) with a newer version.
The solution I have so far is this bit of code which I'd really like to simplify:
sys.path, remainder = sys.path[:1], sys.path[1:]
site.addsitedir(dirpath)
sys.path.extend(remainder)
Is there a cleaner or more Pythonic way of accomplishing this?

For this answer I assume you know how to use setuptools and setup.py.
Assuming you would like to use the standard setuptools workflow for development, I recommend using this code snipped in your appengine_config.py:
import os
import sys
if os.environ.get('CURRENT_VERSION_ID') == 'testbed-version':
# If we are unittesting, fake the non-existence of appengine_config.
# The error message of the import error is handled by gae and must
# exactly match the proper string.
raise ImportError('No module named appengine_config')
# Imports are done relative because Google app engine prohibits
# absolute imports.
lib_dir = os.path.join(os.path.dirname(os.path.realpath(__file__)), 'libs')
# Add every library to sys.path.
if os.path.isdir(lib_dir):
for lib in os.listdir(lib_dir):
if lib.endswith('.egg'):
lib = os.path.join(lib_dir, lib)
# Insert to override default libraries such as webob 1.1.1.
sys.path.insert(0, lib)
And this piece of code in setup.cfg:
[develop]
install-dir = libs
always-copy = true
If you type python setup.py develop, the libraries are downloaded as eggs in the libs directory. appengine_config inserts them to your path.
We use this at work to include webob==1.3.1 and internal packages which are all namespaced using our company namespace.

You may want to have a look at the answers in the Stack Overflow thread, "How do I manage third-party Python libraries with Google App Engine? (virtualenv? pip?)," but for your particular predicament with namespace packages, you're running up against a long-standing issue I filed against site.addsitedir's behavior of appending to sys.path instead of inserting after the first element. Please feel free to add to that discussion with a link to this use case.
I do want to address something else that you said that I think is misleading:
My context is appengine_config.py, but this is really a general Python
question.
The question actually arises from the limitations of Google App Engine and the inability to install third-party packages, and hence, seeking a workaround. Rather than manually adjusting sys.path and using site.addsitedir. In general Python development, if your code uses these, you're Doing It Wrong.
The Python Packaging Authority (PyPA) describes the best practices to put third party libraries on your path, which I outline below:
Create a virtualenv
Mark out your dependencies in your setup.py and/or requirements files (see PyPA's "Concepts and Analyses")
Install your dependencies into the virtualenv with pip
Install your project, itself, into the virtualenv with pip and the -e/--editable flag.
Unfortunately, Google App Engine is incompatible with virtualenv and with pip. GAE chose to block this toolset in an attempt sandbox the environment. Hence, one must use hacks to work around the limitations of GAE to use additional or newer third party libraries.
If you dislike this limitation and want to use standard Python tooling for managing third-party package dependencies, other Platform as a Service providers out there eagerly await your business.

How to import 3rd party module to python if its name is taken?

I want to use this class: google-mail-oauth2-tools but if i'll do something like import oauth2 the imported class would be python-oauth2 which is deprected and doesn't support oauth2 )even though the name is oauth2)
How can I use google module ? do I need to install it first?

Looking at the docs you linked, it looks like the Google Mail oauth2 module is meant to be downloaded and used in-place.
You can, of course, install it… but you can't have two (top-level) modules installed with the same name, so you'd have to uninstall python-oauth2 first.
But if you just use it in-place, in Python 2.7, you can have an oauth2.py in one directory and one in the stdlib. Whichever one you import first will "win"; any subsequent attempts to import oauth2 will get the first one.
To force it to import the right one (in a way that will also work for older Python 2.x and for Python 3.x), you may want to use the imp module to give it the path explicitly.
For example, if you plan to put oauth2.py right alongside the script that imports it, instead of just import oauth2, do:
script_path = os.path.abspath(os.path.dirname(__file__))
f, path, desc = imp.find_module('oauth2', [script_path])
oauth2 = imp.load_module('oauth2', f, path, desc)
… although in some situations, you can get away with cheating by assuming the current working directory is the script directory, and/or by permanently munging sys.path, etc., so you can simplify it in various different ways—ultimately, if it's safe, just this:
sys.path = ['.'] + sys.path
import oauth2
Still, I would consider doing one of the following for safety (and readability):
Uninstall python-oauth2.
Rename the downloaded oauth2.py to something else, like google_oauth2.py, and then import google_oauth2.
Put the downloaded oauth2.py into a package, so you can, e.g., import googletools.oauth2.

Including a Python Library (suds) in a portable way

I'm using suds (brilliant library, btw), and I'd like to make it portable (so that everyone who uses the code that relies on it, can just checkout the files and run it).
I have tracked down 'suds-0.4-py2.6.egg' (in python/lib/site-packages), and put it in with my files, and I've tried:
import path.to.egg.file.suds
from path.to.egg.file.suds import *
import path.to.egg.file.suds-0.4-py2.6
The first two complain that suds doesn't exist, and the last one has invalid syntax.
In the __init__.py file, I have:
__all__ = [ "FileOne" ,
"FileTwo",
"suds-0.4-py2.6"]
and have previously tried
__all__ = [ "FileOne" ,
"FileTwo",
"suds"]
but neither work.
Is this the right way of going about it? If so, how can I get my imports to work. If not, how else can I achieve the same result?
Thanks

You must add your egg file to sys.path, like this:
import sys
# insert at 0 instead of appending to end to take precedence
# over system-installed suds (if there is one).
sys.path.insert(0, "suds-0.4-py2.6.egg")
import suds

.egg files are zipped archives; hence you cannot directly import them as you have discovered.
The easy way is to simply unzip the archive, and then copy the suds directory to your application's source code directory. Since Python will stop at the first module it discovers; your local copy of suds will be used even if it is not installed globally for Python.
One step up from that, is to add the egg to your path by appending it to sys.path.
However, the proper way would be to package your application for distribution; or provide a requirements file that lets other people know what external packages your program depends on.

Usually I distribute my program with a requirements.txt file that contain all dependencies and their version.
The users can then install these libraries with:
pip install -r requirements.txt
I don't think including eggs with your code is a good idea, what if the user use python2.7 instead of python2.6
More info about requirement file: http://www.pip-installer.org/en/latest/requirements.html

How to make an "always relative to current module" file path?

Let's say you have a module which contains
myfile = open('test.txt', 'r')
And the 'test.txt' file is in the same folder. If you'll run the module, the file will be opened successfully.
Now, let's say you import that module from another one which is in another folder. The file won't be searched in the same folder as the module where that code is.
So how to make the module search files with relative paths in the same folder first?
There are various solutions by using "__file__" or "os.getcwd()", but I'm hoping there's a cleaner way, like same special character in the string you pass to open() or file().

The solution is to use __file__ and it's pretty clean:
import os
TEST_FILENAME = os.path.join(os.path.dirname(__file__), 'test.txt')

For normal modules loaded from .py files, the __file__ should be present and usable. To join the information from __file__ onto your relative path, there's a newer option than os.path interfaces available since 2014:
from pathlib import Path
here = Path(__file__).parent
fname = here / "test.txt"
with fname.open() as f:
...
pathlib was added to Python in 3.4 - see PEP428. For users still on Python 2.7 wanting to use the same APIs, a backport is available.
Note that when you're working with a Python package, there are better approaches available for reading resources - you could consider moving to importlib-resources. This requires Python 3.7+, for older versions you can use pkgutil. One advantage of packaging the resources correctly, rather than joining data files relative to the source tree, is that the code will still work in cases where it's not extracted on a filesystem (e.g. a package in a zipfile). See How to read a (static) file from inside a Python package? for more details about reading/writing data files in a package.

Getting Python to use the ActiveTcl libraries

Is there any way to get Python to use my ActiveTcl installation instead of having to copy the ActiveTcl libraries into the Python/tcl directory?

Not familiar with ActiveTcl, but in general here is how to get a package/module to be loaded when that name already exists in the standard library:
import sys
dir_name="/usr/lib/mydir"
sys.path.insert(0,dir_name)
Substitute the value for dir_name with the path to the directory containing your package/module, and run the above code before anything is imported. This is often done through a 'sitecustomize.py' file so that it will take effect as soon as the interpreter starts up so you won't need to worry about import ordering.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Finding a file in a Python module distribution [duplicate] - python

Use pkgutil.get_data. It’s the cousin of pkg_resources.resource_stream, but in the standard library, and should work with flat filesystem installs as well as zipped packages and other importers.

Related

What is the cleanest way to add a directory of third-party packages to the beginning of the Python path?

How to import 3rd party module to python if its name is taken?

Including a Python Library (suds) in a portable way

How to make an "always relative to current module" file path?

Getting Python to use the ActiveTcl libraries

Categories

Resources