Include entire directory in python setup.py data_files

Include entire directory in python setup.py data_files - python

The data_files parameter for setup takes input in the following format:
setup(...
data_files = [(target_directory, [list of files to be put there])]
....)
Is there a way for me to specify an entire directory of data instead, so I don't have to name each file individually and update it as I change implementation in my project?
I attempted to use os.listdir(), but I don't know how to do that with relative paths, I couldn't use os.getcwd() or os.realpath(__file__) since those don't point to my repository root correctly.

karelv has the right idea, but to answer the stated question more directly:
from glob import glob
setup(
#...
data_files = [
('target_directory_1', glob('source_dir/*')), # files in source_dir only - not recursive
('target_directory_2', glob('nested_source_dir/**/*', recursive=True)), # includes sub-folders - recursive
# etc...
],
#...
)

import glob
for filename in glob.iglob('inner_dir/**/*', recursive=True):
print (filename)
Doing this, you get directly a list of files relative to current directory.

I don't know how to do that with relative paths
You need to get the path of the directory first, so...
Say you have this directory structure:
cur_directory
|- setup.py
|- inner_dir
|- file2.py
To get the directory of the current file (in this case setup.py), use this:
cur_directory_path = os.path.abspath(os.path.dirname(__file__))
Then, to get a directory path relative to current_directory, just join some other directories, eg:
inner_dir_path = os.path.join(cur_directory_path, 'inner_dir')
If you want to move up a directory, just use "..", for example:
parent_dir_path = os.path.join(current_directory, '..')
Once you have that path, you can do os.listdir
For completeness:
If you want the path of a file, in this case "file2.py" relative to setup.py, you could do:
file2_path = os.path.join(cur_directory_path, 'inner_dir', 'file2.py')

I ran into the same problem with directories containing nested subdirectories. The glob solutions didn't work, as they would include the directory in the list, which setup would blow up on, and in cases where I excluded matched directories, it still dumped them all in the same directory, which is not what I wanted, either. I ended up just falling back on os.walk():
def generate_data_files():
data_files = []
data_dirs = ('data', 'plugins')
for path, dirs, files in chain.from_iterable(os.walk(data_dir) for data_dir in data_dirs):
install_dir = os.path.join(sys.prefix, 'share/<my-app>/' + path)
list_entry = (install_dir, [os.path.join(path, f) for f in files if not f.startswith('.')])
data_files.append(list_entry)
return data_files
and then setting data_files=generate_data_files() in the setup() block.

With nested subdirectories, if you want to preserve the original directory structure, you can use os.walk(), as proposed in another answer.
However, an easier solution uses pbr library, which extends setuptools. See here for documentation on how to use it to install an entire directory structure:
https://docs.openstack.org/pbr/latest/user/using.html#files

Related

Python os.walk Include only specific folders

I am writing a Python script that takes user input in the form of a date eg 20180829, which will be a subdirectory name, it then uses the os.walk function to walk through a specific directory and once it reaches the directory that is passed in it will jump inside and look at all the directory's within it and create a directory structure in a different location.
My directory structure will look something like this:
|dir1
|-----|dir2|
|-----------|dir3
|-----------|20180829
|-----------|20180828
|-----------|20180827
|-----------|20180826
So dir3 will have a number of sub folders which will all be in the format of a date. I need to be able to copy the directory structure of just the directory that is passed in at the start eg 20180829 and skip the rest of directory's.
I have been looking online for a way to do this but all I can find is ways of Excluding directory's from the os.walk function like in the thread below:
Filtering os.walk() dirs and files
I also found a thread that allows me to print out the directory paths that I want but will not let me create the directory's I want:
Python 3.5 OS.Walk for selected folders and include their subfolders.
The following is the code I have which is printing out the correct directory structure but is creating the entire directory structure in the new location which I don't want it to do.
includes = '20180828'
inputpath = Desktop
outputpath = Documents
for startFilePath, dirnames, filenames in os.walk(inputpath, topdown=True):
endFilePath = os.path.join(outputpath, startFilePath)
if not os.path.isdir(endFilePath):
os.mkdir(endFilePath)
for filename in filenames:
if (includes in startFilePath):
print(includes, "+++", startFilePath)
break

I am not sure if I understand what you need, but I think you overcomplicate a few things. If the code below doesn't help you, let me know and we will think about other approaches.
I run this to create an example like yours.
# setup example project structure
import os
import sys
PLATFORM = 'windows' if sys.platform.startswith('win') else 'linux'
DESKTOP_DIR = \
os.path.join(os.path.join(os.path.expanduser('~')), 'Desktop') \
if PLATFORM == 'linux' \
else os.path.join(os.path.join(os.environ['USERPROFILE']), 'Desktop')
example_dirs = ['20180829', '20180828', '20180827', '20180826']
for _dir in example_dirs:
path = os.path.join(DESKTOP_DIR, 'dir_from', 'dir_1', 'dir_2', 'dir_3', _dir)
os.makedirs(path, exist_ok=True)
And here's what you need.
# do what you want to do
dir_from = os.path.join(DESKTOP_DIR, 'dir_from')
dir_to = os.path.join(DESKTOP_DIR, 'dir_to')
target = '20180828'
for root, dirs, files in os.walk(dir_from, topdown=True):
for _dir in dirs:
if _dir == target:
path = os.path.join(root, _dir).replace(dir_from, dir_to)
os.makedirs(path, exist_ok=True)
continue

What is difference between root and base directory?

I am trying to use shutils.py , make_archive function.
here: https://docs.python.org/2/library/shutil.html#archiving-operations
but I can't understand the difference between root_dir and base_dir.
Here's a simple code using make_archive:
#!user/bin/python
from os import path
from os import curdir
from shutil import make_archive
# Setting current Directory
current = path.realpath(curdir)
# Now Compressing
make_archive("Backup", "gztar", current)
This will create an archive named Backup.tar.gz which contains a . Directory inside it.
I don't want the . directory but the whole thing inside in the archive.

root_dir refers to base directory of output file, or working directory for your working script.
base_dir refers to content you want pack.
For example, if you have a directory tree like:
/home/apast/git/someproject
And you want to build a package for someproject folder, you can set:
root_dir="/home/apast/git"
base_dir="someproject"
If the contents of your tree is like following, for example:
/home/apast/git/someproject/test.py
/home/apast/git/someproject/model.py
The content of your package will acquire following structure:
someproject/test.py
someproject/model.py
And your package file will be stored at:
/home/apast/git/<packfile-name>
Like doc shows, by default, root_dir and base_dir are initialized for your current working directory (cwd, or curdir). But, you can use it in a more flexible way.

Let's consider following dir structure:
/home/apast/git/web/tornado.py
/home/apast/git/web/setup.py
/home/apast/git/core/service.py
/home/apast/git/mobile/gui.py
/home/apast/git/mobile/restfulapi.py
We will try two snippets to clarify examples:
1. Defining base_dir
2. Without defined base_dir
Defining base_dir, we specify which directory we will include on our file:
from shutil import make_archive
root_dir = "/home/apast/git/"
make_archive(base_name="/tmp/outputfile",
format="gztar",
root_dir=root_dir,
base_dir="web")
This code will generate a file called /tmp/outputfile.tar.gz with following struct:
web/tornado.py
web/setup.py
Running without base_dir, like following:
from shutil import make_archive
root_dir = "/home/apast/git/"
make_archive(base_name="/tmp/outputfile",
format="gztar",
root_dir=root_dir)
It will product a file containing:
web/tornado.py
web/setup.py
core/service.py
mobile/gui.py
mobile/restfulapi.py
To define specific folders, maybe it will be necessary use some other technique, like gzip lib directly.

cd in the root_dir... then tar the base_dir...
the docs makes me confuse too, read the code, that will make u clear.

It's a bit confusing if you read the documentation but if you see it visually, it can help quite a bit.
The root_dir is the directory that you will be storing the file in.
If I were storing a file in C:\Users\Elipzer\Desktop\MyFolder\, That would be my root_dir.
The base_dir is the part added onto the root_dir so if I were storing it under my MyFolder in ...\MyFolder\MySubFolder\, I would put that as the base_dir.
In many cases, there is no need to use these since you can just change the default directory to the directory that you want to store the file in and the make_archive function will just use the default directory as the root_dir and base_dir.

How to add package data recursively in Python setup.py?

I have a new library that has to include a lot of subfolders of small datafiles, and I'm trying to add them as package data. Imagine I have my library as so:
library
- foo.py
- bar.py
data
subfolderA
subfolderA1
subfolderA2
subfolderB
subfolderB1
...
I want to add all of the data in all of the subfolders through setup.py, but it seems like I manually have to go into every single subfolder (there are 100 or so) and add an init.py file. Furthermore, will setup.py find these files recursively, or do I need to manually add all of these in setup.py like:
package_data={
'mypackage.data.folderA': ['*'],
'mypackage.data.folderA.subfolderA1': ['*'],
'mypackage.data.folderA.subfolderA2': ['*']
},
I can do this with a script, but seems like a super pain. How can I achieve this in setup.py?
PS, the hierarchy of these folders is important because this is a database of material files and we want the file tree to be preserved when we present them in a GUI to the user, so it would be to our advantage to keep this file structure intact.

The problem with the glob answer is that it only does so much. I.e. it's not fully recursive. The problem with the copy_tree answer is that the files that are copied will be left behind on an uninstall.
The proper solution is a recursive one which will let you set the package_data parameter in the setup call.
I've written this small method to do this:
import os
def package_files(directory):
paths = []
for (path, directories, filenames) in os.walk(directory):
for filename in filenames:
paths.append(os.path.join('..', path, filename))
return paths
extra_files = package_files('path_to/extra_files_dir')
setup(
...
packages = ['package_name'],
package_data={'': extra_files},
....
)
You'll notice that when you do a pip uninstall package_name, that you'll see your additional files being listed (as tracked with the package).

Use Setuptools instead of distutils.
Use data files instead of package data. These do not require __init__.py.
Generate the lists of files and directories using standard Python code, instead of writing it literally:
data_files = []
directories = glob.glob('data/subfolder?/subfolder??/')
for directory in directories:
files = glob.glob(directory+'*')
data_files.append((directory, files))
# then pass data_files to setup()

To add all the subfolders using package_data in setup.py:
add the number of * entries based on you subdirectory structure
package_data={
'mypackage.data.folderA': ['*','*/*','*/*/*'],
}

Use glob to select all subfolders in your setup.py
...
packages=['your_package'],
package_data={'your_package': ['data/**/*']},
...

Update
According to the change log setuptools now supports recursive globs, using **, in package_data (as of v62.3.0, released May 2022).
Original answer
#gbonetti's answer, using a recursive glob pattern, i.e. **, would be perfect.
However, as commented by #daniel-himmelstein, that does not work yet in setuptools package_data.
So, for the time being, I like to use the following workaround, based on pathlib's Path.glob():
def glob_fix(package_name, glob):
# this assumes setup.py lives in the folder that contains the package
package_path = Path(f'./{package_name}').resolve()
return [str(path.relative_to(package_path))
for path in package_path.glob(glob)]
This returns a list of path strings relative to the package path, as required.
Here's one way to use this:
setuptools.setup(
...
package_data={'my_package': [*glob_fix('my_package', 'my_data_dir/**/*'),
'my_other_dir/some.file', ...], ...},
...
)
The glob_fix() can be removed as soon as setuptools supports ** in package_data.

If you don't have any problem with getting your setup.py code dirty use distutils.dir_util.copy_tree.
The whole problem is how to exclude files from it.
Heres some the code:
import os.path
from distutils import dir_util
from distutils import sysconfig
from distutils.core import setup
__packagename__ = 'x'
setup(
name = __packagename__,
packages = [__packagename__],
)
destination_path = sysconfig.get_python_lib()
package_path = os.path.join(destination_path, __packagename__)
dir_util.copy_tree(__packagename__, package_path, update=1, preserve_mode=0)
Some Notes:
This code recursively copy the source code into the destination path.
You can just use the same setup(...) but use copy_tree() to extend the directory you want into the path of installation.
The default paths of distutil installation can be found in it's API.
More information about copy_tree() module of distutils can be found here.

I can suggest a little code to add data_files in setup():
data_files = []
start_point = os.path.join(__pkgname__, 'static')
for root, dirs, files in os.walk(start_point):
root_files = [os.path.join(root, i) for i in files]
data_files.append((root, root_files))
start_point = os.path.join(__pkgname__, 'templates')
for root, dirs, files in os.walk(start_point):
root_files = [os.path.join(root, i) for i in files]
data_files.append((root, root_files))
setup(
name = __pkgname__,
description = __description__,
version = __version__,
long_description = README,
...
data_files = data_files,
)

I can do this with a script, but seems like a super pain. How can I achieve this in setup.py?
Here is a reusable, simple way:
Add the following function in your setup.py, and call it as per the Usage instructions. This is essentially the generic version of the accepted answer.
def find_package_data(specs):
"""recursively find package data as per the folders given
Usage:
# in setup.py
setup(...
include_package_data=True,
package_data=find_package_data({
'package': ('resources', 'static')
}))
Args:
specs (dict): package => list of folder names to include files from
Returns:
dict of list of file names
"""
return {
package: list(''.join(n.split('/', 1)[1:]) for n in
flatten(glob('{}/{}/**/*'.format(package, f), recursive=True) for f in folders))
for package, folders in specs.items()}

I'm going to throw my solution in here in case anyone is looking for a clean way to include their compiled sphinx docs as data_files.
setup.py
from setuptools import setup
import pathlib
import os
here = pathlib.Path(__file__).parent.resolve()
# Get documentation files from the docs/build/html directory
documentation = [doc.relative_to(here) for doc in here.glob("docs/build/html/**/*") if pathlib.Path.is_file(doc)]
data_docs = {}
for doc in documentation:
doc_path = os.path.join("your_top_data_dir", "docs")
path_parts = doc.parts[3:-1] # remove "docs/build/html", ignore filename
if path_parts:
doc_path = os.path.join(doc_path, *path_parts)
# create all appropriate subfolders and append relative doc path
data_docs.setdefault(doc_path, []).append(str(doc))
setup(
...
include_package_data=True,
# <sys.prefix>/your_top_data_dir
data_files=[("your_top_data_dir", ["data/test-credentials.json"]), *list(data_docs.items())]
)
With the above solution, once you install your package you'll have all the compiled documentation available at os.path.join(sys.prefix, "your_top_data_dir", "docs"). So, if you wanted to serve the now-static docs using nginx you could add the following to your nginx file:
location /docs {
# handle static files directly, without forwarding to the application
alias /www/your_app_name/venv/your_top_data_dir/docs;
expires 30d;
}
Once you've done that, you should be able to visit {your-domain.com}/docs and see your Sphinx documentation.

If you don't want to add custom code to iterate through the directory contents, you can use pbr library, which extends setuptools. See here for documentation on how to use it to copy an entire directory, preserving the directory structure:
https://docs.openstack.org/pbr/latest/user/using.html#files

You need to write a function to return all files and its paths , you can use the following
def sherinfind():
# Add all folders contain files or other sub directories
pathlist=['templates/','scripts/']
data={}
for path in pathlist:
for root,d_names,f_names in os.walk(path,topdown=True, onerror=None, followlinks=False):
data[root]=list()
for f in f_names:
data[root].append(os.path.join(root, f))
fn=[(k,v) for k,v in data.items()]
return fn
Now change the data_files in setup() as follows,
data_files=sherinfind()

Directory is not being recognized in Python

I'm uploading a zipped folder that contains a folder of text files, but it's not detecting that the folder that is zipped up is a directory. I think it might have something to do with requiring an absolute path in the os.path.isdir call, but can't seem to figure out how to implement that.
zipped = zipfile.ZipFile(request.FILES['content'])
for libitem in zipped.namelist():
if libitem.startswith('__MACOSX/'):
continue
# If it's a directory, open it
if os.path.isdir(libitem):
print "You have hit a directory in the zip folder -- we must open it before continuing"
for item in os.listdir(libitem):

The file you've uploaded is a single zip file which is simply a container for other files and directories. All of the Python os.path functions operate on files on your local file system which means you must first extract the contents of your zip before you can use os.path or os.listdir.
Unfortunately it's not possible to determine from the ZipFile object whether an entry is for a file or directory.
A rewrite or your code which does an extract first may look something like this:
import tempfile
# Create a temporary directory into which we can extract zip contents.
tmpdir = tempfile.mkdtemp()
try:
zipped = zipfile.ZipFile(request.FILES['content'])
zipped.extractall(tmpdir)
# Walk through the extracted directory structure doing what you
# want with each file.
for (dirpath, dirnames, filenames) in os.walk(tmpdir):
# Look into subdirectories?
for dirname in dirnames:
full_dir_path = os.path.join(dirpath, dirname)
# Do stuff in this directory
for filename in filenames:
full_file_path = os.path.join(dirpath, filename)
# Do stuff with this file.
finally:
# ... Clean up temporary diretory recursively here.

Usually to make things handle relative paths etc when running scripts you'd want to use os.path.
It seems to me that you're reading from a Zipfile the items you've not actually unzipped it so why would you expect the file/dirs to exist?
Usually I'd print os.getcwd() to find out where I am and also use os.path.join to join with the root of the data directory, whether that is the same as the directory containing the script I can't tell. Using something like scriptdir = os.path.dirname(os.path.abspath(__file__)).
I'd expect you would have to do something like
libitempath = os.path.join(scriptdir, libitem)
if os.path.isdir(libitempath):
....
But I'm guessing at what you're doing as it's a little unclear for me.

Creating a relative symlink in python without using os.chdir()

Say I have a path to a file:
/path/to/some/directory/file.ext
In python, I'd like to create a symlink in the same directory as the file, that
points to the file. I'd like to end up with this:
/path/to/some/directory/symlink -> file.ext
I can do this fairly easily using os.chdir() to cd into the directory and
create the symlinks. But os.chdir() is not thread safe, so I'd like to avoid
using it. Assuming that the current working directory of the process is not
the directory with the file (os.getcwd() != '/path/to/some/directory'),
what's the best way to do this?
I guess I could create a busted link in whatever directory I'm in, then
move it to the directory with the file:
import os, shutil
os.symlink('file.ext', 'symlink')
shutil.move('symlink', '/path/to/some/directory/.')
Is there a better way to do this?
Note, I don't want to end up with is this:
/path/to/some/directory/symlink -> /path/to/some/directory/file.ext

You can also use os.path.relpath() so that you can use symlinks with relative paths. Say your script is in a directory foo/ and this directory has subdirectories src/ and dst/, and you want to create relative symlinks in dst/ to point to the files in src/. To do so, you can do:
import os
from glob import glob
for src_path in glob('src/*'):
os.symlink(
os.path.relpath(
src_path,
'dst/'
),
os.path.join('dst', os.path.basename(src_path))
)
Listing the contents of dst/ then shows:
1.txt -> ../src/1.txt
2.txt -> ../src/2.txt
Relative symlinks are useful for if you want to create a tarball of the whole foo directory tree, as I don't believe tar updates symlinks to point to the relative path inside of the generated tarball.

You could just set the second argument to the destination, like:
import os
os.symlink('file.ext', '/path/to/some/directory/symlink')

python function to create a relative symlink:
def relative_symlink(src, dst):
dir = os.path.dirname(dst)
Src = os.path.relpath(src, dir)
Dst = os.path.join(dir, os.path.basename(src))
return os.symlink(Src, Dst)

Nowadays, this can be accomplished using pathlib
from pathlib import Path
target = Path('../target.txt')
my_symlink = Path('symlink.txt')
my_symlink.symlink_to(target)
where target is a relative Path or str.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Include entire directory in python setup.py data_files - python

import glob for filename in glob.iglob('inner_dir/**/*', recursive=True): print (filename) Doing this, you get directly a list of files relative to current directory.

Related

Python os.walk Include only specific folders

What is difference between root and base directory?

How to add package data recursively in Python setup.py?

Directory is not being recognized in Python

Creating a relative symlink in python without using os.chdir()

Categories

Resources