Trying to read JSON file within a Python package - python

I am in the process of packaging up a python package that I'll refer to as MyPackage.
The package structure is:
MyPackage/
script.py
data.json
The data.json file comprises cached data that is read in script.py.
I have figured out how to include data files (use of setuptools include_package_data=True and to also include path to data file in the MANIFEST.in file) but now when I pip install this package and import the installed MyPackage (currently testing install by pip from the GitHub repository) I get a FileNotFound exception (data.json) in the script that is to utilize MyPackage. However, I see that the data.json file is indeed installed in Lib/site-packages/MyPackage.
Am I doing something wrong here by trying to read in a json file in a package?
Note that in script.py I am attempting to read data.json as open('data.json', 'r')
Am I screwing up something regarding the path to the data file?

You're not screwing something up, accessing package resources is just a little tricky - largely because they can be packaged in formats where your .json might strictly speaking not exist as an actual file on the system where your package is installed (e.g. as zip-app). The right way to access your data file is not by specifying a path to it (like "MyPackage/data.json"), but by accessing it as a resource of your installed package (like "MyPackage.data.json"). The distinction might seem pedantic, but it can matter a lot.
Anyway, the access should be done using the builtin importlib.resources module:
import importlib.resources
import json
with importlib.resources.open_text("MyPackage", "data.json") as file:
data = json.load(file)
# you should be able to access 'data' like a dictionary here
If you happen to work on a python version lower than 3.7, you will have to install it as importlib_resources from pyPI.

I resolved the issue by getting the 'relative path' to where the package is.
self.data = self.load_data(path=os.path.join(
os.path.dirname(os.path.abspath(__file__)),
'data.json'))
load_data just reads the data file
Any constructive criticism is still very much welcome. Not trying to write stupid code if I can't help it :)

Related

Can python snippets be placed in pyproject.toml?

I would like to read the "version" from a file
# The "version" is no longer taken from here but rather from the "version" file
version = "0.3.13" # This should be read from a file..
How can this be done inside the pyproject.toml .. or is that not possible? If not possible is there a way to read the contents from a file into a variable inside the pyproject.toml? Ultimately is that file just static ?
toml is a static file format. See https://toml.io So there is no way to read content from a different place.
Because you are talking about pyproject.toml and version I guess your goal is to have single source of truth for the version number of your python project, so that it can be used during runtime. If so the correct way to do it, to have the version in the pyproject.toml. Once the project is installed you can use importlib.metadata to get the version number in your program:
import importlib.metadata
version = importlib.metadata.version("my-package")

Python modules' paths

In some Python scripts I see the following imports:
import fileA
import someDir.fileB
from fileC import functionA
There exist corresponding files fileA.py, someDir/fileB.py and fileC.py. However, while looking in the Requests source code, I found this in the __init__.py file:
from requests.packages.urllib3.contrib import pyopenssl
In this case, requests is the CWD and packages.urllib3.contrib.pyopenssl.py is the file. Why does this defy convention? I do see that the packages.urllib3.contrib directory does also have a __init__.py file, which seems to be related.
Furthermore, I'm not sure if it is related but I think it is so I post it here. In my script I have the folder kennethreitz/requests, since the application depends on the Requests module but I'm deploying it to environments which might not have Requests installed. However, simply adding to the file import kennethreitz.requests is not including the Requests module. I import kennethreitz.requests.__init__ and a few other obvious permutations but I cannot get the module to import. How can I package Requests with my code? The obvious Google searches are not helping.
requests is using an absolute import. You cannot arbitrarily nest packages into other directories and still expect things to work.
Instead, add the kennethreitz directory (which should not have a __init__.py file) to your sys.path module search path. That way the requests module will still be importable as a top-level package.
Next, you may want to look into Python packaging, dependencies and using a tool like pip or zc.buildout to deploy your code for you. Those tools handle dependencies for you and will install requests as required. See the Python Packaging User Guide for an introduction.

Is there a faster method to load a yaml file than the standard .load method? Django/Python

I am loading a big yaml file and it is taking forever. I am wondering if there is a faster method than the yaml.load() method.
I have read that there is a CLoader method but havent been able to run it.
The website that suggested this CLoader method asks me to do this:
Download the source package PyYAML-3.08.tar.gz and unpack it.
Go to the directory PyYAML-3.08 and run:
$ python setup.py install
If you want to use LibYAML bindings, which are much faster than the pure Python version, you need to download and install LibYAML.
Then you may build and install the bindings by executing
$ python setup.py --with-libyaml install
In order to use LibYAML based parser and emitter, use the classes CParser and CEmitter:
from yaml import load, dump
try:
from yaml import CLoader as Loader, CDumper as Dumper
except ImportError:
from yaml import Loader, Dumper
This looks like this will work but I dont have a setup.py directory anywhere in my Django project and therefore can't install/import any of these things
Can anyone help me figure out how to do this or let me know about another faster loading method??
Thanks for the help!!
I have no idea what's faster - bspymaster's ideas might be the most useful.
When you download PyYAML-3.08.tar.gz, inside the archive there will be a setup.py what you can run.
Note to use LibYAML, download this: http://pyyaml.org/download/libyaml/yaml-0.1.4.tar.gz
And run using the instructions from http://pyyaml.org/wiki/LibYAML
You will need a set a build tools, which should be installed on linux/unix, for osx make sure xcode is installed, and I'm not sure about windows.

Including a Python Library (suds) in a portable way

I'm using suds (brilliant library, btw), and I'd like to make it portable (so that everyone who uses the code that relies on it, can just checkout the files and run it).
I have tracked down 'suds-0.4-py2.6.egg' (in python/lib/site-packages), and put it in with my files, and I've tried:
import path.to.egg.file.suds
from path.to.egg.file.suds import *
import path.to.egg.file.suds-0.4-py2.6
The first two complain that suds doesn't exist, and the last one has invalid syntax.
In the __init__.py file, I have:
__all__ = [ "FileOne" ,
"FileTwo",
"suds-0.4-py2.6"]
and have previously tried
__all__ = [ "FileOne" ,
"FileTwo",
"suds"]
but neither work.
Is this the right way of going about it? If so, how can I get my imports to work. If not, how else can I achieve the same result?
Thanks
You must add your egg file to sys.path, like this:
import sys
# insert at 0 instead of appending to end to take precedence
# over system-installed suds (if there is one).
sys.path.insert(0, "suds-0.4-py2.6.egg")
import suds
.egg files are zipped archives; hence you cannot directly import them as you have discovered.
The easy way is to simply unzip the archive, and then copy the suds directory to your application's source code directory. Since Python will stop at the first module it discovers; your local copy of suds will be used even if it is not installed globally for Python.
One step up from that, is to add the egg to your path by appending it to sys.path.
However, the proper way would be to package your application for distribution; or provide a requirements file that lets other people know what external packages your program depends on.
Usually I distribute my program with a requirements.txt file that contain all dependencies and their version.
The users can then install these libraries with:
pip install -r requirements.txt
I don't think including eggs with your code is a good idea, what if the user use python2.7 instead of python2.6
More info about requirement file: http://www.pip-installer.org/en/latest/requirements.html

Finding a file in a Python module distribution [duplicate]

This question already has answers here:
How to read a (static) file from inside a Python package?
(6 answers)
Closed 2 years ago.
I've written a Python package that includes a bsddb database of pre-computed values for one of the more time-consuming computations. For simplicity, my setup script installs the database file in the same directory as the code which accesses the database (on Unix, something like /usr/lib/python2.5/site-packages/mypackage/).
How do I store the final location of the database file so my code can access it? Right now, I'm using a hack based on the __file__ variable in the module which accesses the database:
dbname = os.path.join(os.path.dirname(__file__), "database.dat")
It works, but it seems... hackish. Is there a better way to do this? I'd like to have the setup script just grab the final installation location from the distutils module and stuff it into a "dbconfig.py" file that gets installed alongside the code that accesses the database.
Try using pkg_resources, which is part of setuptools (and available on all of the pythons I have access to right now):
>>> import pkg_resources
>>> pkg_resources.resource_filename(__name__, "foo.config")
'foo.config'
>>> pkg_resources.resource_filename('tempfile', "foo.config")
'/usr/lib/python2.4/foo.config'
There's more discussion about using pkg_resources to get resources on the eggs page and the pkg_resources page.
Also note, where possible it's probably advisable to use pkg_resources.resource_stream or pkg_resources.resource_string because if the package is part of an egg, resource_filename will copy the file to a temporary directory.
Use pkgutil.get_data. It’s the cousin of pkg_resources.resource_stream, but in the standard library, and should work with flat filesystem installs as well as zipped packages and other importers.
That's probably the way to do it, without resorting to something more advanced like using setuptools to install the files where they belong.
Notice there's a problem with that approach, because on OSes with real a security framework (UNIXes, etc.) the user running your script might not have the rights to access the DB in the system directory where it gets installed.
Use the standard Python-3.7 library's importlib.resources module,
which is more efficient than setuptools:pkg_resources
(on previous Python versions, use the backported importlib_resources library).
Attention: For this to work, the folder where the data-file resides must be a regular python-package. That means you must add an __init__.py file into it, if not already there.
Then you can access it like this:
try:
import importlib.resources as importlib_resources
except ImportError:
# In PY<3.7 fall-back to backported `importlib_resources`.
import importlib_resources
## Note that the actual package could have been used,
# not just its (string) name, with something like:
# from XXX import YYY as data_pkg
data_pkg = '.'
fname = 'database.dat'
db_bytes = importlib_resources.read_binary(data_pkg, fname)
# or if a file-like stream is needed:
with importlib_resources.open_binary(data_pkg, fname) as db_file:
...

Categories