Source-level transformation in Sphinx

Source-level transformation in Sphinx - python

I'm trying to write a sphinx extension that performs a source-level transformation, but I don't know how to actually change the output file.
My extension looks something like this:
def my_source_handler(app, docname, source):
import re
print 'test'
source = [re.sub("foo", "bar", source[0])]
return source
def setup(app):
app.connect('source-read', my_source_handler)
app.add_config_value('my_source_handler_include', True, False)
However, when I add the module to the extensions list and build html, it prints the 'test' but does not actually change the "foo"s to "bar"s in the output HTML file.
The Sphinx documentation is a little vague, saying, "You can process the contents and replace this item to implement source-level transformations" with regards to the source argument.
The problem is I'm not sure how I'm supposed to go about replacing the source argument.

Actually after a little digging I figured it out, you're supposed to replace the contents of the first (and only) element of source, not replace source itself, so like:
def my_source_handler(app, docname, source):
import re
print 'test'
source[0] = re.sub("foo", "bar", source[0])

Related

Dealing with à in generating code for a string literal using Python 3.5's AST module, need to open with right coding

To generate JavaScript from Python in the Transcrypt Python to JS compiler, Python 3.5's ast module is used in combination with the following code:
class Generator (ast.NodeVisitor):
...
...
def visit_Str (self, node):
self.emit (repr (node.s)) # Simplified to need less context on StackOverflow
...
...
This works fine e.g. for the following line of Python:
test = "âäéèêëiîïoôöùüû"
which is correctly translated to:
var test = 'âäéèêëiîïoôöùüû';
Only the character à gives problems:
test = "àâäéèêëiîïoôöùüû"
is translated to:
var test = 'Ĝxa0âäéèêëiîïoôöùüû';
Is there any way to have the ast module read the source file respecting coding directives like:
# coding=<encoding name>

To open a Python file for parsing, use
tokenize.open
rather than the ordinary
open
function.
It will open, read the pep263 coding hint and return the open file as if it were opened by the ordinary open using the right encoding.
Quite hard to find, not currently in the Green Tree Snakes doc. Actually found it by searching for 'coding' in the CPython sources on GitHub.
Have created an issue for Green Tree Snakes doc to add this.

how to "source" file into python script

I have a text file /etc/default/foo which contains one line:
FOO="/path/to/foo"
In my python script, I need to reference the variable FOO.
What is the simplest way to "source" the file /etc/default/foo into my python script, same as I would do in bash?
. /etc/default/foo

Same answer as #jil however, that answer is specific to some historical version of Python.
In modern Python (3.x):
exec(open('filename').read())
replaces execfile('filename') from 2.x

You could use execfile:
execfile("/etc/default/foo")
But please be aware that this will evaluate the contents of the file as is into your program source. It is potential security hazard unless you can fully trust the source.
It also means that the file needs to be valid python syntax (your given example file is).

Keep in mind that if you have a "text" file with this content that has a .py as the file extension, you can always do:
import mytextfile
print(mytestfile.FOO)
Of course, this assumes that the text file is syntactically correct as far as Python is concerned. On a project I worked on we did something similar to this. Turned some text files into Python files. Wacky but maybe worth consideration.

Just to give a different approach, note that if your original file is setup as
export FOO=/path/to/foo
You can do source /etc/default/foo; python myprogram.py (or . /etc/default/foo; python myprogram.py) and within myprogram.py all the values that were exported in the sourced' file are visible in os.environ, e.g
import os
os.environ["FOO"]

If you know for certain that it only contains VAR="QUOTED STRING" style variables, like this:
FOO="some value"
Then you can just do this:
>>> with open('foo.sysconfig') as fd:
... exec(fd.read())
Which gets you:
>>> FOO
'some value'
(This is effectively the same thing as the execfile() solution
suggested in the other answer.)
This method has substantial security implications; if instead of FOO="some value" your file contained:
os.system("rm -rf /")
Then you would be In Trouble.
Alternatively, you can do this:
>>> with open('foo.sysconfig') as fd:
... settings = {var: shlex.split(value) for var, value in [line.split('=', 1) for line in fd]}
Which gets you a dictionary settings that has:
>>> settings
{'FOO': ['some value']}
That settings = {...} line is using a dictionary comprehension. You could accomplish the same thing in a few more lines with a for loop and so forth.
And of course if the file contains shell-style variable expansion like ${somevar:-value_if_not_set} then this isn't going to work (unless you write your very own shell style variable parser).

There are a couple ways to do this sort of thing.
You can indeed import the file as a module, as long as the data it contains corresponds to python's syntax. But either the file in question is a .py in the same directory as your script, either you're to use imp (or importlib, depending on your version) like here.
Another solution (that has my preference) can be to use a data format that any python library can parse (JSON comes to my mind as an example).
/etc/default/foo :
{"FOO":"path/to/foo"}
And in your python code :
import json
with open('/etc/default/foo') as file:
data = json.load(file)
FOO = data["FOO"]
## ...
file.close()
This way, you don't risk to execute some uncertain code...
You have the choice, depending on what you prefer. If your data file is auto-generated by some script, it might be easier to keep a simple syntax like FOO="path/to/foo" and use imp.
Hope that it helps !

The Solution
Here is my approach: parse the bash file myself and process only variable assignment lines such as:
FOO="/path/to/foo"
Here is the code:
import shlex
def parse_shell_var(line):
"""
Parse such lines as:
FOO="My variable foo"
:return: a tuple of var name and var value, such as
('FOO', 'My variable foo')
"""
return shlex.split(line, posix=True)[0].split('=', 1)
if __name__ == '__main__':
with open('shell_vars.sh') as f:
shell_vars = dict(parse_shell_var(line) for line in f if '=' in line)
print(shell_vars)
How It Works
Take a look at this snippet:
shell_vars = dict(parse_shell_var(line) for line in f if '=' in line)
This line iterates through the lines in the shell script, only process those lines that has the equal sign (not a fool-proof way to detect variable assignment, but the simplest). Next, run those lines into the function parse_shell_var which uses shlex.split to correctly handle the quotes (or the lack thereof). Finally, the pieces are assembled into a dictionary. The output of this script is:
{'MOO': '/dont/have/a/cow', 'FOO': 'my variable foo', 'BAR': 'My variable bar'}
Here is the contents of shell_vars.sh:
FOO='my variable foo'
BAR="My variable bar"
MOO=/dont/have/a/cow
echo $FOO
Discussion
This approach has a couple of advantages:
It does not execute the shell (either in bash or in Python), which avoids any side-effect
Consequently, it is safe to use, even if the origin of the shell script is unknown
It correctly handles values with or without quotes
This approach is not perfect, it has a few limitations:
The method of detecting variable assignment (by looking for the presence of the equal sign) is primitive and not accurate. There are ways to better detect these lines but that is the topic for another day
It does not correctly parse values which are built upon other variables or commands. That means, it will fail for lines such as:
FOO=$BAR
FOO=$(pwd)

Based off the answer with exec(.read()), value = eval(.read()), it will only return the value. E.g.
1 + 1: 2
"Hello Word": "Hello World"
float(2) + 1: 3.0

update /etc/sysctl.conf with Python's ConfigParser

I am able to use Python's ConfigParser library to read /etc/sysctl.conf by adding a [dummy] section and overriding ConfigParser's read() method as follows:
class SysctlConfigParser(ConfigParser.ConfigParser):
def read(self, fn):
text = open(fn).read()
contents = StringIO.StringIO("[dummy]\n" + text)
self.readfp(contents, fn)
Now the tricky part is to write back configuration updates that my python program made, because if I would now call ConfigParser.write() directly then it would add back this [dummy] section as well:
[dummy]
net.netfilter.nf_conntrack_max = 313
net.netfilter.nf_conntrack_expect_max = 640
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 5
Here are my questions:
Is there an elegant way to make ConfigParser not to add this [dummy] section? It seems odd if I would have to open this file again just to remove the first line that contains this dummy section.
Maybe ConfigParser is not the right tool to edit sysctl.conf? If so are there any other Python libraries that would allow to update sysctl.conf in a convenient way from Python?

ConfigParser is designed for parsing INI-style configuration files. /etc/sysconf.conf is not this sort of file.
You could use the Augeas bindings for Python if you want a parser that works out-of-the-box:
import augeas
aug = augeas.Augeas()
aug.set('/files/etc/sysctl.conf/net.ipv4.ip_forwarding', '1')
aug.set('/files/etc/sysctl.conf/fs.inotify.max_user_watches', '8192')
aug.save()
The format of the file is pretty trivial (just a collection of <name> = <value> lines with optional comments).

element tree treats similar files differently

Here are two different files that my python (2.6) script encounters. One will parse, the other will not. I'm just curious as to why this happens.
This xml file will not parse and the script will fail:
<Landfire_Feedback_Point_xlsform id="fbfm40v10" instanceID="uuid:9e062da6-b97b-4d40-b354-6eadf18a98ab" submissionDate="2013-04-30T23:03:32.881Z" isComplete="true" markedAsCompleteDate="2013-04-30T23:03:32.881Z" xmlns="http://opendatakit.org/submissions">
<date_test>2013-04-17</date_test>
<plot_number>10</plot_number>
<select_multiple_names>BillyBob</select_multiple_names>
<geopoint_plot>43.2452830500 -118.2149402900 210.3000030518 3.0000000000</geopoint_plot><fbfm40_new>GS2</fbfm40_new>
<select_grazing>NONE</select_grazing>
<image_close>1366230030355.jpg</image_close>
<plot_note>No road present.</plot_note>
<n0:meta xmlns:n0="http://openrosa.org/xforms">
<n0:instanceID>uuid:9e062da6-b97b-4d40-b354-6eadf18a98ab</n0:instanceID>
</n0:meta>
</Landfire_Feedback_Point_xlsform>
This xml file will parse correctly and the script succeeds:
<Landfire_Feedback_Point_xlsform id="fbfm40v10">
<date_test>2013-05-14</date_test>
<plot_number>010</plot_number>
<select_multiple_names>BillyBob</select_multiple_names>
<geopoint_plot>43.26630563 -118.39881809 351.70001220703125 5.0</geopoint_plot>
<fbfm40_new>GR1</fbfm40_new>
<select_grazing>HIGH</select_grazing>
<image_close>fbfm40v10_PLOT_010_ID_6.jpg</image_close>
<plot_note>Heavy grazing</plot_note>
<meta><instanceID>uuid:90e7d603-86c0-46fc-808f-ea0baabdc082</instanceID></meta>
</Landfire_Feedback_Point_xlsform>
Here is a little python script that demonstrates that one will work, while the other will not. I'm just looking for an explanation as to why one is seen by ElementTree as an xml file while the other isn't. Specifically, the one that doesn't seem to parse fails with a "'NONE' type doesn't have a 'text' attribute" or something similar. But, it's because it doesn't seem to consider the file as xml or it can't see any elements beyond the opening line. Any explanation or direction with regard to this error would be appreciated. Thanks in advance.
Python script:
import os
from xml.etree import ElementTree
def replace_xml_attribute_in_file(original_file,element_name,attribute_value):
#THIS FUNCTION ONLY WORKS ON XML FILES WITH UNIQUE ELEMENT NAMES
# -DUPLICATE ELEMENT NAMES WILL ONLY GET THE FIRST ELEMENT WITH A GIVEN NAME
#split original filename and add tempfile name
tempfilename="temp.xml"
rootsplit = original_file.rsplit('\\') #split the root directory on the backslash
rootjoin = '\\'.join(rootsplit[:-1]) #rejoin the root diretory parts with a backslash -minus the last
temp_file = os.path.join(rootjoin,tempfilename)
et = ElementTree.parse(original_file)
author=et.find(element_name)
author.text = attribute_value
et.write(temp_file)
if os.path.exists(temp_file) and os.path.exists(original_file): #if both the original and the temp files exist
os.remove(original_file) #erase the original
os.rename(temp_file,original_file) #rename the new file
else:
print "Something went wrong."
replace_xml_attribute_in_file("testfile1.xml","image_close","whoopdeedoo.jpg");

Here is a little python script that demonstrates that one will work, while the other will not. I'm just looking for an explanation as to why one is seen by ElementTree as an xml file while the other isn't.
Your code doesn't demonstrate that at all. It demonstrates that they're both seen by ElementTree as valid XML files chock full of nodes. They both parse just fine, they both read past the first line, etc.
The only problem is that the first one doesn't have a node named 'image_close', so your code doesn't work.
You can see that pretty easily:
for node in et.getroot().getchildren():
print node.tag
You get 9 children of the root, with either version.
And the output to that should show you the problem. The node you want is actually named {http://opendatakit.org/submissions}image_close in the first example, rather than image_close as in the second.
And, as you can probably guess, this is because of the namespace=http://opendatakit.org/submissions in the root node. ElementTree uses the "James Clark notation" for mapping unknown-namespaced names to universal names.
Anyway, because none of the nodes are named image_close, the et.find(element_name) returns None, so your code stores author=None, then tries to assign to author.text, and gets an error.
As for how to fix this problem… well, you could learn how namespaces work by default in ElementTree, or you could upgrade to Python 2.7 or install a newer ElementTree for 2.6 that lets you customize things more easily. But if you want to do custom namespace handling and also stick with your old version… I'd start with this article (and its two predecessors) and this one.

Make Sphinx generate RST class documentation from pydoc

I'm currently migrating all existing (incomplete) documentation to Sphinx.
The problem is that the documentation uses Python docstrings (the module is written in C, but it probably does not matter) and the class documentation must be converted into a form usable for Sphinx.
There is sphinx.ext.autodoc, but it automatically puts current docstrings to the document. I want to generate a source file in (RST) based on current docstrings, which I could then edit and improve manually.
How would you transform docstrings into RST for Sphinx?

The autodoc does generate RST only there is no official way to get it out of it. The easiest hack to get it was by changing sphinx.ext.autodoc.Documenter.add_line method to emit me the line it gets.
As all I want is one time migration, output to stdout is good enough for me:
def add_line(self, line, source, *lineno):
"""Append one line of generated reST to the output."""
print(self.indent + line)
self.directive.result.append(self.indent + line, source, *lineno)
Now autodoc prints generated RST on stdout while running and you can simply redirect or copy it elsewhere.

monkey patching autodoc so it works without needing to edit anything:
import sphinx.ext.autodoc
rst = []
def add_line(self, line, source, *lineno):
"""Append one line of generated reST to the output."""
rst.append(line)
self.directive.result.append(self.indent + line, source, *lineno)
sphinx.ext.autodoc.Documenter.add_line = add_line
try:
sphinx.main(['sphinx-build', '-b', 'html', '-d', '_build/doctrees', '.', '_build/html'])
except SystemExit:
with file('doc.rst', 'w') as f:
for line in rst:
print >>f, line

As far as I know there are no automated tools to do this. My approach would therefore be to write a small script that reads relevant modules (based on sphinc.ext.autodoc) and throws doc strings into a file (formatted appropriately).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.