distutils Extension arguments -- include vs depends vs source? - python

I'm trying to understand the dependency structure of pandas' cython extensions in setup.py.
distutils.extension.Extension has arguments sources, depends, and include_dirs, and I can't figure out the difference between these. In particular, there are a bunch of places in the pandas case where I can delete entries in depends (or pxdfiles) without breaking the build.
What is the distinction between these three arguments?
Update following answer from #phd:
I appreciate the thought, will try to better communicate the source of my confusion.
In the pandas setup.py file linked above, the pandas._libs.tslib extension is passed to distutils.extension.Extension with the args/kwargs:
ext = Extension('pandas._libs.tslib',
sources=['pandas/_libs/tslib.pyx',
'pandas/_libs/src/util.pxd',
'pandas/_libs/src/datetime/np_datetime.c',
'pandas/_libs/src/datetime/np_datetime_strings.c',
'pandas/_libs/src/period_helper.c'],
depends=['pandas/_libs/src/datetime/np_datetime.h',
'pandas/_libs/src/datetime/np_datetime_strings.h',
'pandas/_libs/src/period_helper.h',
'pandas/_libs/src/datetime.pxd'],
include_dirs=['pandas/_libs/src/klib', 'pandas/_libs/src'])
Take e.g. util.pxd in the sources entry. Is this not redundant with the presence of pandas/_libs/src in the include_dirs entry? tslib imports directly from datetime.pxd which has "imports" of the form cdef extern from "datetime/np_datetime.h" and cdef extern from "datetime/np_datetime_strings.h". Are those "allowed" because of the presence of the "*.c" files in the sources or the "*.h" files in the depends or both or...
I've tried a whole bunch of permutations of removing subsets of these dependencies, have not seen many patterns in terms of which break the build.

See the detailed docs and the source code for build_ext command.
sources is a list of source files (*.c) to compile the extension.
depends — a list of additional files the extensions is required to compile.
include_dirs — a list of directories where a compiler will look for include (header) files (*.h).
pxdfiles are Cython-specific.

Related

How to include a particular directory for checking with mypy

In the mypy.ini file I have:
[mypy]
exclude = ['tests', 'build']
Which works alright as long as both of these directories exists - if one of them doesn't exist then I get the following error message:
There are no .py[i] files in directory '.'
I'm wondering if there's a way to exclude these directories without this error - or if not - if there's a way to explicitly tell mypy which directories it should check.
The issue is that exclude expects a regular expression in the INI file (see docs). I'm not great with regex, but you can start off with
[mypy]
exclude = (build|tests)
The documentation on the --exclude parameter is also useful (copied relevant parts below.
A regular expression that matches file names, directory names and paths which mypy should ignore while recursively discovering files to check. Use forward slashes on all platforms.
For instance, to avoid discovering any files named setup.py you could pass --exclude '/setup.py$'. Similarly, you can ignore discovering directories with a given name by e.g. --exclude /build/ or those matching a subpath with --exclude /project/vendor/. To ignore multiple files / directories / paths, you can provide the –exclude flag more than once, e.g --exclude '/setup.py$' --exclude '/build/'.
Also see https://github.com/python/mypy/issues/10310 for relevant discussion.

How to guess a file extension with python?

I have recently tried to make a python code which takes a path of a file without an extension and determine what extension it has.
I was looking for something like the example below. In the example the extension is exe (but the code doesn't know that yet).
path = 'C:\\MyPath\\Example'
#takes the path above and guesses the programs extension:
extension = guess_extension(path)
#adds the extension to the path:
fullPath = path+extension
print(fullPath)
Output:
C:\MyPath\Example.exe
If you know a python module that would do that (or something similar), please list it below.
I have tried to use filetype (filetype.guess()) and mimetypes (mimetypes.guess_extension()) modules, but they would both return value of none.
I have also tried to use answers from many questions like this one, but that still didn't work.
It sounds like the built in glob module (glob docs) might be what you're looking for. This module provides Unix style pattern expansion functionality within Python.
In the following example the incomplete path variable has the str .* appended to it when passed to glob.glob. This essentially tells glob.glob to return a list of valid paths found within the host system that start the same as path, followed by a period (designating a file extension), with the asterisk matching any and all characters following those from path + '.'.
import glob
path = r'C:\Program Files\Firefox Developer Edition\minidump-analyzer'
full = glob.glob(path+'.*')
print(full[0])
Output: C:\Program Files\Firefox Developer Edition\minidump-analyzer.exe
It is worth noting that the above is just an illustration of how glob could be leveraged as part of a solution to your question. Proper handling of unexpected inputs, edge cases, exceptions etc. should be implemented as required by the needs of your program.

how does pycparser reads the header files listed in includes in C code files?

I am trying to parse a C file using pycparser. I am curious to know that while pre-processing the C file does the pycparser reads only those library files which are provided in fake lib folder(if you provide the path of fake lib in cpp_args) or it also reads from the location mentioned in include statements, such as-
in line below
#include<folder1/folder2/xyz.h>
where will the pycparser search for xyz.h will it only be in FAKE LIB folder?
It will search other directories than the fake folder. If you look in the file pycparser/__init__.py, you'll find a function called preprocess_file which invokes the C preprocessor on your input file and puts the resulting output in a string, which it then passes to the next function called parse_file. The code in each of these functions is fairly clear and well-commented, so give it a read and see if it makes sense.
The fake folder is included only for standard library headers like stdlib.h, stdio.h and so forth. Those headers often contain non-portable compiler-specific extensions; chances are, you'll only need to know that there's a function printf(...) in order to be able to parse your code.

Bundling GTK resources with py2exe

I'm using Python 2.6 and PyGTK 2.22.6 from the all-in-one installer on Windows XP, trying to build a single-file executable (via py2exe) for my app.
My problem is that when I run my app as a script (ie. not built into an .exe file, just as a loose collection of .py files), it uses the native-looking Windows theme, but when I run the built exe I see the default GTK theme.
I know that this problem can be fixed by copying a bunch of files into the dist directory created by py2exe, but everything I've read involves manually copying the data, whereas I want this to be an automatic part of the build process. Furthermore, everything on the topic (including the FAQ) is out of date - PyGTK now keeps its files in C:\Python2x\Lib\site-packages\gtk-2.0\runtime\..., and just copying the lib and etc directories doesn't fix the problem.
My questions are:
I'd like to be able to programmatically find the GTK runtime data in setup.py rather than hard coding paths. How do I do this?
What are the minimal resources I need to include?
Update: I may have almost answered #2 by trial-and-error. For the "wimp" (ie. MS Windows) theme to work, I need the files from:
runtime\lib\gtk-2.0\2.10.0\engines\libwimp.dll
runtime\etc\gtk-2.0\gtkrc
runtime\share\icons\*
runtime\share\themes\MS-Windows
...without the runtime prefix, but otherwise with the same directory structure, sitting directly in the dist directory produced by py2exe. But where does the 2.10.0 come from, given that gtk.gtk_version is (2,22,0)?
Answering my own question here, but if anyone knows better feel free to answer too. Some of it seems quite fragile (eg. version numbers in paths), so comment or edit if you know a better way.
1. Finding the files
Firstly, I use this code to actually find the root of the GTK runtime. This is very specific to how you install the runtime, though, and could probably be improved with a number of checks for common locations:
#gtk file inclusion
import gtk
# The runtime dir is in the same directory as the module:
GTK_RUNTIME_DIR = os.path.join(
os.path.split(os.path.dirname(gtk.__file__))[0], "runtime")
assert os.path.exists(GTK_RUNTIME_DIR), "Cannot find GTK runtime data"
2. What files to include
This depends on (a) how much of a concern size is, and (b) the context of your application's deployment. By that I mean, are you deploying it to the whole wide world where anyone can have an arbitrary locale setting, or is it just for internal corporate use where you don't need translated stock strings?
If you want Windows theming, you'll need to include:
GTK_THEME_DEFAULT = os.path.join("share", "themes", "Default")
GTK_THEME_WINDOWS = os.path.join("share", "themes", "MS-Windows")
GTK_GTKRC_DIR = os.path.join("etc", "gtk-2.0")
GTK_GTKRC = "gtkrc"
GTK_WIMP_DIR = os.path.join("lib", "gtk-2.0", "2.10.0", "engines")
GTK_WIMP_DLL = "libwimp.dll"
If you want the Tango icons:
GTK_ICONS = os.path.join("share", "icons")
There is also localisation data (which I omit, but you might not want to):
GTK_LOCALE_DATA = os.path.join("share", "locale")
3. Piecing it together
Firstly, here's a function that walks the filesystem tree at a given point and produces output suitable for the data_files option.
def generate_data_files(prefix, tree, file_filter=None):
"""
Walk the filesystem starting at "prefix" + "tree", producing a list of files
suitable for the data_files option to setup(). The prefix will be omitted
from the path given to setup(). For example, if you have
C:\Python26\Lib\site-packages\gtk-2.0\runtime\etc\...
...and you want your "dist\" dir to contain "etc\..." as a subdirectory,
invoke the function as
generate_data_files(
r"C:\Python26\Lib\site-packages\gtk-2.0\runtime",
r"etc")
If, instead, you want it to contain "runtime\etc\..." use:
generate_data_files(
r"C:\Python26\Lib\site-packages\gtk-2.0",
r"runtime\etc")
Empty directories are omitted.
file_filter(root, fl) is an optional function called with a containing
directory and filename of each file. If it returns False, the file is
omitted from the results.
"""
data_files = []
for root, dirs, files in os.walk(os.path.join(prefix, tree)):
to_dir = os.path.relpath(root, prefix)
if file_filter is not None:
file_iter = (fl for fl in files if file_filter(root, fl))
else:
file_iter = files
data_files.append((to_dir, [os.path.join(root, fl) for fl in file_iter]))
non_empties = [(to, fro) for (to, fro) in data_files if fro]
return non_empties
So now you can call setup() like so:
setup(
# Other setup args here...
data_files = (
# Use the function above...
generate_data_files(GTK_RUNTIME_DIR, GTK_THEME_DEFAULT) +
generate_data_files(GTK_RUNTIME_DIR, GTK_THEME_WINDOWS) +
generate_data_files(GTK_RUNTIME_DIR, GTK_ICONS) +
# ...or include single files manually
[
(GTK_GTKRC_DIR, [
os.path.join(GTK_RUNTIME_DIR,
GTK_GTKRC_DIR,
GTK_GTKRC)
]),
(GTK_WIMP_DIR, [
os.path.join(
GTK_RUNTIME_DIR,
GTK_WIMP_DIR,
GTK_WIMP_DLL)
])
]
)
)

How to specify header files in setup.py script for Python extension module?

How do I specify the header files in a setup.py script for a Python extension module? Listing them with source files as follows does not work. But I can not figure out where else to list them.
from distutils.core import setup, Extension
from glob import glob
setup(
name = "Foo",
version = "0.1.0",
ext_modules = [Extension('Foo', glob('Foo/*.cpp') + glob('Foo/*.h'))]
)
Add MANIFEST.in file besides setup.py with following contents:
graft relative/path/to/directory/of/your/headers/
Try the headers kwarg to setup(). I don't know that it's documented anywhere, but it works.
setup(name='mypkg', ..., headers=['src/includes/header.h'])
I've had so much trouble with setuptools it's not even funny anymore.
Here's how I ended up having to use a workaround in order to produce a working source distribution with header files: I used package_data.
I'm sharing this in order to potentially save someone else the aggravation. If you know a better working solution, let me know.
See here for details:
https://bitbucket.org/blais/beancount/src/ccb3721a7811a042661814a6778cca1c42433d64/setup.py?fileviewer=file-view-default#setup.py-36
# A note about setuptools: It's profoundly BROKEN.
#
# - The header files are needed in order to distribution a working
# source distribution.
# - Listing the header files under the extension "sources" fails to
# build; distutils cannot make out the file type.
# - Listing them as "headers" makes them ignored; extra options to
# Extension() appear to be ignored silently.
# - Listing them under setup()'s "headers" makes it recognize them, but
# they do not get included.
# - Listing them with "include_dirs" of the Extension fails as well.
#
# The only way I managed to get this working is by working around and
# including them as "packaged data" (see {63fc8d84d30a} below). That
# includes the header files in the sdist, and a source distribution can
# be installed using pip3 (and be built locally). However, the header
# files end up being installed next to the pure Python files in the
# output. This is the sorry situation we're living in, but it works.
There's a corresponding ticket in my OSS project:
https://bitbucket.org/blais/beancount/issues/72
If I remember right you should only need to specify the source files and it's supposed to find/use the headers.
In the setup-tools manual, I see something about this I believe.
"For example, if your extension requires header files in the include directory under your distribution root, use the include_dirs option"
Extension('foo', ['foo.c'], include_dirs=['include'])
http://docs.python.org/distutils/setupscript.html#preprocessor-options

Categories