Always regenerate Sphinx documents containing a specific directive

Always regenerate Sphinx documents containing a specific directive - python

Sphinx usually incrementally builds the documentation which means that only files that have been changed will be regenerated. I am wondering if there is a way to tell Sphinx to always regenerate certain files which may not directly have been changed but are influenced by changes in other files. More specific: Is there a way to tell Sphinx to always regenerate files that contain a certain directive? The documentation I am working on relies on the possibility to collect and reformat information from other pages with the help of directives quite frequently. A clean (make clean && make [html]) and/or full (sphinx-build -a) build takes significantly longer than an incremental build. Additionally, manually keeping track of files which contain the directive might be complicated. The documentation is written by 10+ authors with limited experience in writing Sphinx documentation.
But even in less complex scenarios you might face this 'issue':
For instance sphinx.ext.todo contains a directive called todolist which collects todos from the whole documentation. If I create a file containing all the todos from my documentation (basically an empty document just containing the todolist directive) the list is not updated until I make a clean build or alter the file.
If you want to test it yourself: Create a documentation with sphinx-quickstart and stick to the default values except for
'> todo: write "todo" entries that can be shown or hidden on build (y/n) [n]: y'
Add a file in source called todos.rst and reference this file from index.rst.
Content of the index.rst:
Welcome to sphinx-todo's documentation!
=======================================
.. toctree::
:maxdepth: 2
todos
.. todo::
I have to do this
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
Content of todos.rst:
.. _`todos`:
List of ToDos
=============
.. todolist::
Assuming you use the html output you will notice that todos.html will not change when you add todos to index.html.
tl;dr: How -- if possible -- do I include files containing a specific directive (e.g. todolist) into an incremental build of Sphinx without the need of manually keeping track of them?

By default Sphinx will only update output for new or changed files. This is buried under sphinx-build -a.
At the end of the documentation of command options for sphinx-build:
You can also give one or more filenames on the command line after the source and build directories. Sphinx will then try to build only these output files (and their dependencies).
You could either invoke sphinx-build directly or through your makefile, depending on the makefile that shipped with your version of Sphinx (you can customize the makefile, too).

Just for the record: I benchmarked several solutions.
I created a function called touch_files in my conf.py. It searches for strings in files and -- if found -- touches the file to trigger a rebuilt:
def touch_files(*args):
# recursively search the 'source' directory
for root, dirnames, filenames in os.walk('.'):
# check all rst files
for filename in fnmatch.filter(filenames, '*.rst'):
cur = os.path.join(root, filename)
f = open(cur)
# access content directly from disk
s = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
if any(s.find(d) != -1 for d in args):
# if a string pattern has been found alter the edit
# time of the file
os.utime(cur, None)
f.close()
# actually call the function
touch_files('.. todolist::')
touch_files can be called with a variable amount of arguments and will edit a file when ONE of the arguments has been found. I tried to optimize the function with regular expression but this did not achieve much. Reading the file content directly from the disk with mmap seemed to have a minor impact.
This is the result of 78 files total from which 36 contain one of two directives.
Command Time Comment
time make html 2.3 s No changes
time sh -c 'make clean && make html' 13.3 s
time make htmlfull 9.4 s sphinx-build -a
time make html 8.4 s with 'touch_files'
'touch_files' 0.2 s tested with testit
Result: Every command has been called just a few times (except 'touch_files') and therefore lack statistical reliability. Sphinx requires roughly 2.3 seconds to check the documentation for changes without doing anything. A clean build requires 13.3 seconds which is much longer than a build with sphinx-build -a. If we just rebuilt 36 out of 78 files, the built process is slightly faster, although I doubt a significant difference could be found here. The overhead of 'touch_files' is rather low. Finding the strings is quite cheap compared to editing the timestamps.
Conclusion: As Steve Piercy pointed out, using sphinx-build -a seems to be the most reasonable approach. At least for my use case. Should a file which does not contain a directive in question result in long building times touch_files might be useful though.

Related

Can I achieve precise control over location of .c files generated by cythonize?

I am using Cython as part of my build setup for a large project, driven by CMake.
I can't seem to get Cython to generate the .c files in a sensible location.
My file layout:
C:\mypath\src\demo.py # Cython source file
C:\mypath\build\bin # I want demo.pyd to end up here
C:\mypath\build\projects\cyt\setup.py # Generated by CMake
My setup.py is generated by CMake (a lot of it depends on configure_file), in the location specified above.
This location conforms to the usual structure of the overarching project (which builds over a hundred libraries and executables) and is not something I want to (or can easily) change.
The generated setup.py looks like this:
from distutils.core import setup, Extension
from Cython.Build import cythonize
import os.path
extension_args = {
'extra_compile_args' : ['/DWIN32','/DWIN64'],
'extra_link_args' : ['/MACHINE:X64'],
}
source = '../../../src/demo.py'
modules = [Extension(
os.path.splitext(os.path.basename(source))[0],
sources = [source],
**extension_args
)]
modules = cythonize(
modules,
build_dir = 'BUILD_DIR',
compiler_directives = {'language_level' : 2}
)
setup(name = 'demo',
version = '0.1',
description = '',
ext_modules = modules)
(Note that this is heavily simplified compared to the real case, which passes many additional arguments in extension_args, and includes many source files, each with its own object in modules.
Nevertheless, I have verified that the minimised version above reproduces my issue).
Cython is run like this:
cd C:\mypath\build\projects\cyt
python setup.py build_ext --build-lib C:/mypath/build/bin --build-temp C:/mypath/build/projects/cyt
Ideally, I would want all intermediary build artefacts from Cython (the generated C files, object files, exp files, etc.) to reside somewhere in or below C:\mypath\build\projects\cyt.
However, I can't seem to achieve that.
Here is where build artefacts actually end up:
demo.pyd ends up in C:\mypath\build\bin, where I want it. No problem here.
The object file demo.obj, along with the linked files demo.exp and demo.lib, end up in C:\mypath\build\projects\src. I want them inside cyt.
The C file demo.c ends up in C:\mypath\build\src. Again, I want this in projects\cyt.
In the setup.py, I am setting the build_dir parameter for cythonize as suggested in this answer, but it doesn't seem to work as I would like.
I also tried using cython_c_in_temp as per another answer on that question, but that has no effect (and judging from my inspection of Cython source code, does not apply to cythonize calls at all).
I tried using an absolute paths for source, but that made things even worse, as the C file ended up generated right next to demo.py, inside the source tree (as C:\src\demo.c).
My question: How can I make sure that all the generated intermediary files (C, obj, and friends) end up in the same directory as the generated setup.py, or below that?
I can think of two workarounds for my situation, but they both feel like hacks which I would like to avoid:
Copy all the Python source files from their locations in C:\mypath\src to alongside the generated setup.py, so that I can refer to them without .. in the path.
That would likely solve the issue, but burdens the (already long) build process with tens of additional file copy operations I'd rather avoid.
Since the path where the files end up seems to be composed by concatenating "the directory of setup.py + the value of build_dir + the value of source", I could count the number of .. in the source path and specify build_dir deep enough so that the evaluation results in the path I actually want.
This is both extremely hacky and very fragile.
I hope a better solution exists.

So it seems like you've run into a bug. Here's the relevant section of code in Cython. Basically, cythonize will try to build the paths to your .c and .o files as so:
C:/mypath/build/projects/cyt/BUILD_DIR/../../../src/demo.c
and so instead of nicely contained temporary files, you end up with insanity. Using an absolute path to demo.py won't help either, since that same code will just pass absolute paths through unchanged.
There doesn't seem to be a way to fix this in user-space short of extensive monkey-patching, so I submitted a pull-request to Cython with an actual fix. Once that's been merged in, you should then be able to run:
cd C:\mypath\build\projects\cyt
python setup.py build_ext -b C:/mypath/build/bin -t .
to get the result you want (-b and -t are the short forms of --build-lib and --build-temp).

passing information from one script to another

I have two python scripts, scriptA and scriptB, which run on Unix systems. scriptA takes 20s to run and generates a number X. scriptB needs X when it is run and takes around 500ms. I need to run scriptB everyday but scriptA only once every month. So I don't want to run scriptA from scriptB. I also don't want to manually edit scriptB each time I run scriptA. I thought of updating a file through scriptA but I'm not sure where such a file can be placed ideally so that scriptB can read it later; independent of the location of these two scripts. What is the best way of storing this value X in an Unix system so that it can be used later by scriptB?

Many programs in Linux/Unix keep config in /etc/ and use subfolder in /var/ for other files.
But probably you could need root privilages.
If you run script in your home folder than you could create file ~/.scripB.rc or folder ~/.scriptB/ or ~/.config/scriptB/
See also on wikipedia Filesystem Hierarchy Standard

It sounds like you want to serialize ScriptA's results, save it in a file or database somewhere, then have ScriptB read those results (possibly also modifying the file or updating the database entry to indicate that those results have now been processed).
To make that work you need for ScriptA and ScriptB to agree on the location and format of the data ... and you might want to implement some sort of locking to ensure that ScriptB doesn't end up with corrupted inputs if it happens to be run at the same time that ScriptA is writing or updating the data (and, conversely, that ScriptA doesn't corrupt the data store by writing thereto while ScriptB is accessing it).
Of course ScriptA and ScriptB could each have a filename or other data location hard-coded into their sources. However, that would violation the DRY Principle. So you might want them to share a configuration file. (Of course the configuration filename is also repeated in these sources ... or at least the import of the common bit of configuration code ... but the latter still ensures that an installation/configuration detail (location and, possibly, format, of the data store) is decoupled from the source code. Thus it can be changed (in the shared config) without affecting the rest of the code for either script.
As for precisely which type of file and serialization to use ... that's a different question.
These days, as strange as it may sound, I'd would suggest using SQLite3. It may seem like over-kill to use an SQL "database" for simply storing a single value. However, SQLite3 is included in the Python standard libraries, and it only needs a filename for configuration.
You could also use a pickle or JSON or even YAML (which would require a third party module) ... or even just text or some binary representation using something like struct. However, any of those will require that you parse your results and deal with any parsing or formatting errors. JSON would be the simplest option among these alternatives. Additionally you'd have to do your own file locking and handling if you wanted ScriptA and ScriptB (and, potentially, any other scripts you ever write for manipulating this particular data) to be robust against any chance of concurrent operations.
The advantage of SQLite3 is that it handles the parsing and decoding and the locking and concurrency for you. You create the table once (perhaps embedded in ScriptA as a rarely used "--initdb" option for occasions when you need to recreate the data store). Your code to read it might look as simple as:
#!/usr/bin/python
import sqlite3
db = sqlite3.connect('./foo.db')
cur = db.cursor()
results = cur.execute(SELECT value, MAX(date) FROM results').fetchone()[0]
... and writing a new value would look a bit like:
#!/usr/bin/python
# (Same import, db= and cur= from above)
with db:
cur.execute('INSERT INTO results (value) VALUES (?)', (myvalue,))
All of this assuming you had, at some time, initialized the data store (foo.db in this example) with something like:
#!/usr/bin/python
# (Same import, db= and cur= from above)
with db:
cur.execute('CREATE TABLE IF NOT EXISTS results (value INTEGER NOT NULL, date TIMESTAMP DEFAULT current_timestamp)')
(Actually you could just execute that command every time if you wanted your scripts to recovery silently from cleaning out the old data).
This might seem like more code than a JSON file-based approach. However, SQLite3 is providing ACID(transactional) semantics as well as abstracting away the serialization and deserialization.
Also note that I'm glossing over a few details. My example above are actually creating a whole table of results, with timestamps for when they were written to your datastore. These would accumulate over time and, if you were using this approach, you'd periodically want to clean up your "results" table with a command like:
#!/usr/bin/python
# (Same import, db= and cur= from above)
with db:
cur.execute('DELETE FROM results where date < ?', cur.execute('SELECT MAX(date) from results').fetchone())
Alternatively if you really never want to have access to your prior results that change from INSERT into UPDATE like so:
#!/usr/bin/python
# (Same import, db= and cur= from above)
with db:
cur.execute(cur.execute('UPDATE results SET value=(?)', (mynewvalue,))
(Also note that the (mynewvalue,) is a single element tuple. The DBAPI requires that our parameters be wrapped in tuples which is easy to forget when you first start using it with single parameters such as this).
Obviously if you took this UPDATE only approach you could drop the 'date' column from the 'results' table and all those references to MAX(data) from the queries.
I chose use the slightly more complex schema in my early examples because they allow your scripts to be a bit more robust with very little additional complexity. You could then do other error checking, detecting missing values where ScriptB finds that ScriptA hasn't been run as intended, for example).

Edit/run crontab -e:
# this will run every month on the 25th at 2am
0 2 25 * * python /path/to/scriptA.py > /dev/null
# this will run every day at 2:10 am
10 2 * * * python /path/to/scriptB.py > /dev/null
Create an external file for both scripts:
In scriptA:
>>> with open('/path/to/test_doc','w+') as f:
... f.write('1')
...
In scriptB:
>>> with open('/path/to/test_doc','r') as f:
... v = f.read()
...
>>> v
'1'

You can take a look at PyPubSub
It's a python package which provides a publish - subscribe Python API that facilitates event-based programming.
It'll give you an OS independent solution to your problem and only requires few additional lines of code in both A and B.
Also you don't need to handle messy files!

Assuming you are not running the two scripts at the same time, you can (pickle and) save the go between object anywhere so long as when you load and save the file you point to the same system path. For example:
import pickle # or import cPickle as pickle
# Create a python object like a dictionary, list, etc.
favorite_color = { "lion": "yellow", "kitty": "red" }
# Write to file ScriptA
f_myfile = open('C:\\My Documents\\My Favorite Folder\\myfile.pickle', 'wb')
pickle.dump(favorite_color, f_myfile)
f_myfile.close()
# Read from file ScriptB
f_myfile = open('C:\\My Documents\\My Favorite Folder\\myfile.pickle', 'rb')
favorite_color = pickle.load(f_myfile) # variables come out in the order you put them in
f_myfile.close()

how to check for platform incompatible folder (file) names in python

I would like to be able to check from python if a given string could be a valid cross platform folder name - below is the concrete problem I ran into (folder name ending in .), but I'm sure there are some more special cases (e.g.: con, etc.).
Is there a library for this?
From python (3.2) I created a folder on Windows (7) with a name ending in dot ('.'), e.g. (without square brackets): [What I've done on my holidays, Part II.]
When the created folder was ftp'd (to linux, but I guess that's irrelevant), it did not have the dot in it anymore (and in return, this broke a lot of hyperlinks).
I've checked it from the command line, and it seems that the folder doesn't have the '.' in the filename
mkdir tmp.
dir
cd tmp
cd ..\tmp.
Apparently, adding a single dot at the end of the folder name is ignored, e.g.:
cd c:\Users.
works just as expected.

Nope there's sadly no way to do this. For windows you basically can use the following code to remove all illegal characters - but if someone still has a FAT filesystem you'd have to handle these too since those are stricter. Basically you'll have to read the documentation for all filesystem and come up with a complete list. Here's the NTFS one as a starting point:
ILLEGAL_NTFS_CHARS = "[<>:/\\|?*\"]|[\0-\31]"
def __removeIllegalChars(name):
# removes characters that are invalid for NTFS
return re.sub(ILLEGAL_NTFS_CHARS, "", name)
And then you need some "forbidden" name list as well to get rid of COM. Pretty much a complete mess that.. and that's ignoring linux (although there it's pretty relaxed afaik)

Do not end a file or directory name with a space or a period. Although
the underlying file system may support such names, the Windows shell
and user interface does not.
http://msdn.microsoft.com/en-us/library/aa365247.aspx#naming_conventions
That page will give you information about other illegal names too, for Windows that is. Including CON as you said your self.
If you respect those (seemingly harsh) rules, I think you'll be safe on Linux and most other systems too.

Inject the revision number in sourcecode (TortoiseSvn or SVN Shell)

I would like to inject the revision number in source code on commit.
I found out that I could do it through svn shell by doing something like:
find . -name *.php -exec svn propset svn:keywords "Rev"
However someone else said that that would not work as there are no files in the repository (as they files are encrypted), and I should be able to do it in tortoiseSVN. I found the "Hook Scripts" section, but I have completely no experience with this stuff.
Could you give me some indication how the command should look like, if I would like to have the first lines of code look like:
/*
* Version: 154
* Last modified on revision: 150
*/
I know that you could inject by using $ver$ but how to do it so only files in certain directories with certain extensions get this changed.

Don't write your own method for injecting version numbers. Instead,
only introduce the replaced tags $Revision$, etc.) in the files you want the replacement to happen for
only enable replacement (using svn propset svn:keywords Revision or some such) for those files

Checking folder/file ntfs permissions using python

As the question title might suggest, I would very much like to know of the way to check the ntfs permissions of the given file or folder (hint: those are the ones you see in the "security" tab). Basically, what I need is to take a path to a file or directory (on a local machine, or, preferrably, on a share on a remote machine) and get the list of users/groups and the corresponding permissions for this file/folder. Ultimately, the application is going to traverse a directory tree, reading permissions for each object and processing them accordingly.
Now, I can think of a number of ways to do that:
parse cacls.exe output -- easily done, BUT, unless im missing something, cacls.exe only gives the permissions in the form of R|W|C|F (read/write/change/full), which is insufficient (I need to get the permissions like "List folder contents", extended permissions too)
xcacls.exe or xcacls.vbs output -- yes, they give me all the permissions I need, but they work dreadfully slow, it takes xcacls.vbs about ONE SECOND to get permissions on a local system file. Such speed is unacceptable
win32security (it wraps around winapi, right?) -- I am sure it can be handled like this, but I'd rather not reinvent the wheel
Is there anything else I am missing here?

Unless you fancy rolling your own, win32security is the way to go. There's the beginnings of an example here:
http://timgolden.me.uk/python/win32_how_do_i/get-the-owner-of-a-file.html
If you want to live slightly dangerously (!) my in-progress winsys package is designed to do exactly what you're after. You can get an MSI of the dev version here:
http://timgolden.me.uk/python/downloads/WinSys-0.4.win32-py2.6.msi
or you can just checkout the svn trunk:
svn co http://winsys.googlecode.com/svn/trunk winsys
To do what you describe (guessing slightly at the exact requirements) you could do this:
import codecs
from winsys import fs
base = "c:/temp"
with codecs.open ("permissions.log", "wb", encoding="utf8") as log:
for f in fs.flat (base):
log.write ("\n" + f.filepath.relative_to (base) + "\n")
for ace in f.security ().dacl:
access_flags = fs.FILE_ACCESS.names_from_value (ace.access)
log.write (u" %s => %s\n" % (ace.trustee, ", ".join (access_flags)))
TJG

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Always regenerate Sphinx documents containing a specific directive - python

Related

Can I achieve precise control over location of .c files generated by cythonize?

passing information from one script to another

how to check for platform incompatible folder (file) names in python

Inject the revision number in sourcecode (TortoiseSvn or SVN Shell)

Checking folder/file ntfs permissions using python

Categories

Resources