Exclude header comments when generating documentation with sphinx

Exclude header comments when generating documentation with sphinx - python

I am trying to use sphinx to generate documentation for my python package. I have included well documented docstrings within all of my functions. I have called sphinx-quickstart to create the template, filled in the template, and called make html to generate the documentation. My problem is that I also have comments within my python module that are also showing up in the rendered documentation. The comments are not within the functions, but rather as a header at the top of the .py files. I am required to have these comment blocks and cannot remove them, but I don't want them to show up in my documentation.
I'm current using automodule to pull all of the functions out of the module. I know I can use autofunction to get the individual functions one by one, and this avoids the file headers, but I have a lot of functions and there must be a better way.
How can I adjust my conf.py file, or something else to use automodule, but to only pick up the functions and not the comments outside of the functions?
This is what my .py file looks like:
"""
This code is a part of a proj repo.
https://github.com/CurtLH/proj
author: greenbean4me
date: 2018-02-07
"""
def hit(nail):
"""
Hit the nail on the head
This function takes a string and returns the first character in the string
Parameter
----------
nail : str
Can be any word with any length
Return
-------
x : str
First character in the string
Example
-------
>>> x = hit("hello")
>>> print(x)
>>> "h"
"""
return nail[0]
This is my the auto-generated documentation looks like:
Auto-Generated Documentation
This code is a part of a proj repo.
https://github.com/CurtLH/proj
author: greenbean4me date: 2018-02-07
hammer.hit(nail)
Hit the nail on the head
This function takes a string and returns the first character in the string
nail : str
Can be any word with any length
x : str
First character in the string
>>> x = hit("hello")
>>> print(x)
>>> "h"
For a more comprehensive example, check out this example repo on GitHub: https://github.com/CurtLH/proj

As far as I know, there is no way to configure autodoc not to do this.
There is, however, a simple workaround: Adding an empty docstring at the top of your module.
"""""" # <- add this
"""
This code is a part of a proj repo.
https://github.com/CurtLH/proj
author: greenbean4me
date: 2018-02-07
"""
It's barely noticeable, and will trick autodoc into not displaying your module's real docstring.

Related

Python comment-preserving parsing using only builtin libraries?

I wrote a library using just ast and inspect libraries to parse and emit [uses astor on Python < 3.9] internal Python constructs.
Just realised that I really need to preserve comments afterall. Preferably without resorting to a RedBaron or LibCST; as I just need to emit the unaltered commentary; is there a clean and concise way of comment-preserving parsing/emitting Python source with just stdlib?

What I ended up doing was writing a simple parser, without a meta-language in 339 source lines:
https://github.com/offscale/cdd-python/blob/master/cdd/cst_utils.py
Implementation of Concrete Syntax Tree [List!]
Reads source character by character;
Once end of statement† is detected, add statement-type into 1D list;
†end of line if line.lstrip().startswith("#") or line not endswith('\\') and balanced_parens(line) else continue munching until that condition is true… plus some edge-cases around multiline strings and the like;
Once finished there is a big (1D) list where each element is a namedtuple with a value property.
Integration with builtin Abstract Syntax Tree ast
Limit ast nodes to modify—not remove—to: {ClassDef,AsyncFunctionDef,FunctionDef} docstring (first body element Constant|Str), Assign and AnnAssign;
cst_idx, cst_node = find_cst_at_ast(cst_list, _node);
if doc_str node then maybe_replace_doc_str_in_function_or_class(_node, cst_idx, cst_list)
…
Now the cst_list contains only changes to those aforementioned nodes, and only when that change is more than whitespace, and can be created into a string with "".join(map(attrgetter("value"), cst_list)) for outputting to eval or straight out to a source file (e.g., in-place overriding).
Quality control
100% test coverage
100% doc coverage
Support for last 6 versions of Python (including latest alpha)
CI/CD
(Apache-2.0 OR MIT) licensed
Limitations
Lack of meta-language, specifically lack of using Python's provided grammar means new syntax elements won't automatically be supported (match/case is supported, but if there's new syntax introduced since, it isn't [yet?] supported… at least not automatically);
Not builtin to stdlib so stdlib could break compatibility;
Deleting nodes is [probably] not supported;
Nodes can be incorrectly identified if there are shadow variables or similar issues that linters should point out.

Comments can be preserved by merging them back into the generated source code by capturing them with the tokenizer.
Given a toy program in a program variable, we can demonstrate how comments get lost in the AST:
import ast
program = """
# This comment lost
p1v = 4 + 4
p1l = ['a', # Implicit line joining comment for a lost
'b'] # Ending comment for b lost
def p1f(x):
"p1f docstring"
# Comment in function p1f lost
return x
print(p1f(p1l), p1f(p1v))
"""
tree = ast.parse(program)
print('== Full program code:')
print(ast.unparse(tree))
The output shows all comments gone:
== Full program code:
p1v = 4 + 4
p1l = ['a', 'b']
def p1f(x):
"""p1f docstring"""
return x
print(p1f(p1l), p1f(p1v))
However, if we scan the comments with the tokenizer, we can
use this to merge the comments back in:
from io import StringIO
import tokenize
def scan_comments(source):
""" Scan source code file for relevant comments
"""
# Find token for comments
for k,v in tokenize.tok_name.items():
if v == 'COMMENT':
comment = k
break
comtokens = []
with StringIO(source) as f:
tokens = tokenize.generate_tokens(f.readline)
for token in tokens:
if token.type != comment:
continue
comtokens += [token]
return comtokens
comtokens = scan_comments(program)
print('== Comment after p1l[0]\n\t', comtokens[1])
Output (edited to split long line):
== Comment after p1l[0]
TokenInfo(type=60 (COMMENT),
string='# Implicit line joining comment for a lost',
start=(4, 12), end=(4, 54),
line="p1l = ['a', # Implicit line joining comment for a lost\n")
Using a slightly modified version of ast.unparse(), replacing
methods maybe_newline() and traverse() with modified versions,
you should be able to merge back in all comments at their
approximate locations, using the location info from the comment
scanner (start variable), combined with the location info from the
AST; most nodes have a lineno attribute.
Not exactly. See for example the list variable assignment. The
source code is split out over two lines, but ast.unparse()
generates only one line (see output in the second code segment).
Also, you need to ensure to update the location info in the AST
using ast.increment_lineno() after adding code.
It seems some more calls to
maybe_newline() might be needed in the library code (or its
replacement).

Snippets vs. Abbreviations in Vim

What advantages and/or disadvantages are there to using a "snippets" plugin, e.g. snipmate, ultisnips, for VIM as opposed to simply using the builtin "abbreviations" functionality?
Are there specific use-cases where declaring iabbr, cabbr, etc. lack some major features that the snippets plugins provide? I've been unsuccessful in finding a thorough comparison between these two "features" and their respective implementations.
As #peter-rincker pointed out in a comment:
It should be noted that abbreviations can execute code as well. Often via <c-r>= or via an expression abbreviation (<expr>). Example which expands ## to the current file's path: :iabbrev ## <c-r>=expand('%:p')<cr>
As an example for python, let's compare a snipmate snippet and an abbrev in Vim for inserting lines for class declaration.
Snipmate
# New Class
snippet cl
class ${1:ClassName}(${2:object}):
"""${3:docstring for $1}"""
def __init__(self, ${4:arg}):
${5:super($1, self).__init__()}
self.$4 = $4
${6}
Vimscript
au FileType python :iabbr cl class ClassName(object):<CR><Tab>"""docstring for ClassName"""<CR>def __init__(self, arg):<CR><Tab>super(ClassName, self).__init__()<CR>self.arg = arg
Am I missing some fundamental functionality of "snippets" or am I correct in assuming they are overkill for the most part, when Vim's abbr and :help template templates are able to do all most of the stuff snippets do?
I assume it's easier to implement snippets, and they provide additional aesthetic/visual features. For instance, if I use abbr in Vim and other plugins for running/testing python code inside vim--e.g. syntastic, pytest, ropevim, pep8, etc--am I missing out on some key features that snippets provide?

Everything that can be done with snippets can be done with abbreviations and vice-versa. You can have (mirrored or not) placeholders with abbreviations, you can have context-sensitive snippets.
There are two important differences:
Abbreviations are triggered when the abbreviation text has been typed, and a non word character (or esc) is hit. Snippets are triggered on demand, and shortcuts are possible (no need to type while + tab. w + tab may be enough).
It's much more easier to define new snippets (or to maintain old ones) than to define abbreviations. With abbreviations, a lot of boiler plate code is required when we want to do neat things.
There are a few other differences. For instance, abbreviations are always triggered everywhere. And seeing for expanded into for(placeholder) {\n} within a comment or a string context is certainly not what the end-user expects. With snippets, this is not a problem any more: we can expect the end-user to know what's he's doing when he asks to expand a snippet. Still, we can propose context-aware snippets that expand throw into #throw {domain::exception} {explanation} within a comment, or into throw domain::exception({message}); elsewhere.

Snippets
Rough superset of Vim's native abbreviations. Here are the highlights:
Only trigger on key press
Uses placeholders which a user can jump between
Exist only for insert mode
Dynamic expansions
Abbreviations
Great for common typos and small snippets.
Native to Vim so no need for plugins
Typically expand on whitespace or <c-]>
Some special rules on trigger text (See :h abbreviations)
Can be used in command mode via :cabbrev (often used to create command aliases)
No placeholders
Dynamic expansions
Conclusion
For the most part snippets are more powerful and provide many features that other editors enjoy, but you can use both and many people do. Abbreviations enjoy the benefit of being native which can be useful for remote environments. Abbreviations also enjoy another clear advantage which is can be used in command mode.

Snippets are more powerful.
Depending on the implementation, snippets can let you change (or accept defaults for) multiple placeholders and can even execute code when the snippet is expanded.
For example with ultisnips, you can have it execute shell commands, vimscript but also Python code.
An (ultisnips) example:
snippet hdr "General file header" b
# file: `!v expand('%:t')`
# vim:fileencoding=utf-8:ft=`!v &filetype`
# ${1}
#
# Author: ${2:J. Doe} ${3:<jdoe#gmail.com>}
# Created: `!v strftime("%F %T %z")`
# Last modified: `!v strftime("%F %T %z")`
endsnippet
This presents you with three placeholders to fill in (it gives default values for two of them), and sets the filename, filetype and current date and time.
After the word "snippet", the start line contains three items;
the trigger string,
a description and
options for the snippet.
Personally I mostly use the b option where the snippet is expanded at the beginning of a line and the w option that expands the snippet if the trigger string starts at the beginning of a word.
Note that you have to type the trigger string and then input a key or key combination that actually triggers the expansion. So a snippet is not expanded unless you want it to.
Additionally, snippets can be specialized by filetype. Suppose you want to define four levels of headings, h1 .. h4. You can have the same name expand differently between e.g. an HTML, markdown, LaTeX or restructuredtext file.

snippets are like the built-in :abbreviate on steroids, usually with:
parameter insertions: You can insert (type or select) text fragments in various places inside the snippet. An abbreviation just expands once.
mirroring: Parameters may be repeated (maybe even in transformed fashion) elsewhere in the snippet, usually updated as you type.
multiple stops inside: You can jump from one point to another within the snippet, sometimes even recursively expand snippets within one.
There are three things to evaluate in a snippet plugin: First, the features of the snippet engine itself, second, the quality and breadth of snippets provided by the author or others; third, how easy it is to add new snippets.

What is this structured comment and what is "updated" field for?

I am working on Python in Eclipse + PyDev. When I create a new module from "Module: CLI (argparse)" template, Eclipse creates the following comment block in the beginning of the file.
#!/usr/local/bin/python2.7
# encoding: utf-8
'''
packagename.newfile -- shortdesc
packagename.newfile is a description
It defines classes_and_methods
#author: user_name
#copyright: 2015 organization_name. All rights reserved.
#license: license
#contact: user_email
#deffield updated: Updated
'''
It appears to have some kind of structure, I guess it's parsed by something? I have a few questions:
How is this type of comment structure called and by what programs is it used?
How do I include a multiline apache license 2.0 notice under #license?
What is updated field for?

Most Python documentation in this form is adhering to the Epydoc standard.
http://epydoc.sourceforge.net/manual-fields.html shows details of all of the fields that you've listed above. This file can then be parsed by Epydoc and documentation created from that.
This following example shows that multiline documentation is allowed:
def example():
"""
#param x: This is a description of
the parameter x to a function.
Note that the description is
indented four spaces.
#type x: This is a description of
x's type.
#return: This is a description of
the function's return value.
It contains two paragraphs.
"""
#[...]
This was sourced from http://epydoc.sourceforge.net/epydoc.html#fields
The updated part is showing the ability to add in any #style annotation. See http://epydoc.sourceforge.net/epydoc.html#adding-new-fields.
So #deffield updated: Updated means that there's a new annotation #updated. This would be used as follows
"""
#updated 17/02/2015
"""
This would then be rendered into the HTML created from Epydoc.

PyYAML and unusual tags

I am working on a project that uses the Unity3D game engine. For some of the pipeline requirements, it is best to be able to update some files from external tools using Python. Unity's meta and anim files are in YAML so I thought this would be strait forward enough using PyYAML.
The problem is that Unity's format uses custom attributes and I am not sure how to work with them as all the examples show more common tags used by Python and Ruby.
Here is what the top lines of a file look like:
%YAML 1.1
%TAG !u! tag:unity3d.com,2011:
--- !u!74 &7400000
AnimationClip:
m_ObjectHideFlags: 0
m_PrefabParentObject: {fileID: 0}
...
When I try to read the file I get this error:
could not determine a constructor for the tag 'tag:unity3d.com,2011:74'
Now after looking at all the other questions asked, this tag scheme does not seem to resemble those questions and answers. For example this file uses "!u!" which I was unable to figure out what it means or how something similar would behave (my wild uneducated guess says it looks like an alias or namespace).
I can do a hack way and strip the tags out but that is not the ideal way to try to do this. I am looking for help on a solution that will properly handle the tags and allow me to parse & encode the data in a way that preserves the proper format.
Thanks,
-R

I also had this problem, and the internet was not very helpful. After bashing my head against this problem for 3 days, I was able to sort it out...or at least get a working solution. If anyone wants to add more info, please do. But here's what I got.
1) The documentation on Unity's YAML file format(they call it a "textual scene file" because it contains text that is human readable) - http://docs.unity3d.com/Manual/TextualSceneFormat.html
It is a YAML 1.1 compliant format. So you should be able to use PyYAML or any other Python YAML library to load up a YAML object.
Okay, great. But it doesn't work. Every YAML library has issues with this file.
2) The file is not correctly formed. It turns out, the Unity file has some syntactical issues that make YAML parsers error out on it. Specifically:
2a) At the top, it uses a %TAG directive to create an alias for the string "unity3d.com,2011". It looks like:
%TAG !u! tag:unity3d.com,2011:
What this means is anywhere you see "!u!", replace it with "tag:unity3d.com,2011".
2b) Then it goes on to use "!u!" all over the place before each object stream. But the problem is that - to be YAML 1.1 compliant - it should actually declare a tag alias for each stream (any time a new object starts with "--- "). Declaring it once at the top and never again is only valid for the first stream, and the next stream knows nothing about "!u!", so it errors out.
Also, this tag is useless. It basically appends "tag:unity3d.com,2011" to each entry in the stream. Which we don't care about. We already know it's a Unity YAML file. Why clutter the data?
3) The object types are given by Unity's Class ID. Here is the documentation on that:
http://docs.unity3d.com/Manual/ClassIDReference.html
Basically, each stream is defined as a new class of object...corresponding to the IDs in that link. So a "GameObject" is "1", etc. The line looks like this:
--- !u!1 &100000
So the "--- " defines a new stream. The "!u!" is an alias for "tag:unity3d.com,2011" and the "&100000" is the file ID for this object (inside this file, if something references this object, it uses this ID....remember YAML is a node-based representation, so that ID is used to denote a node connection).
The next line is the root of the YAML object, which happens to be the name of the Unity Class...example "GameObject". So it turns out we don't actually need to translate from Class ID to Human Readable node type. It's right there. If you ever need to use it, just take the root node. And if you need to construct a YAML object for Unity, just keep a dictionary around based on that documentation link to translate "GameObject" to "1", etc.
The other problem is that most YAML parsers (PyYAML is the one I tested) only support 3 types of YAML objects out of the box:
Scalar
Sequence
Mapping
You can define/extend custom nodes. But this amounts to hand writing your own YAML parser because you have to define EXPLICITLY how each YAML constructor is created, and outputs. Why would I use a Library like PyYAML, then go ahead and write my own parser to read these custom nodes? The whole point of using a library is to leverage previous work and get all that functionality from day one. I spent 2 days trying to make a new constructor for each class ID in unity. It never worked, and I got into the weeds trying to build the constructors correctly.
THE GOOD NEWS/SOLUTION:
Turns out, all the Unity nodes I've ever run into so far are basic "Mapping" nodes in YAML. So you can throw away the custom node mapping and just let PyYAML auto-detect the node type. From there, everything works great!
In PyYAML, you can pass a file object, or a string. So, my solution was to write a simple 5 line pre-parser to strip out the bits that confuse PyYAML(the bits that Unity incorrectly syntaxed) and feed this new string to PyYAML.
1) Remove line 2 entirely, or just ignore it:
%TAG !u! tag:unity3d.com,2011:
We don't care. We know it's a unity file. And the tag does nothing for us.
2) For each stream declaration, remove the tag alias ("!u!") and remove the class ID. Leave the fileID. Let PyYAML auto-detect the node as a Mapping node.
--- !u!1 &100000
becomes...
--- &100000
3) The rest, output as is.
The code for the pre-parser looks like this:
def removeUnityTagAlias(filepath):
"""
Name: removeUnityTagAlias()
Description: Loads a file object from a Unity textual scene file, which is in a pseudo YAML style, and strips the
parts that are not YAML 1.1 compliant. Then returns a string as a stream, which can be passed to PyYAML.
Essentially removes the "!u!" tag directive, class type and the "&" file ID directive. PyYAML seems to handle
rest just fine after that.
Returns: String (YAML stream as string)
"""
result = str()
sourceFile = open(filepath, 'r')
for lineNumber,line in enumerate( sourceFile.readlines() ):
if line.startswith('--- !u!'):
result += '--- ' + line.split(' ')[2] + '\n' # remove the tag, but keep file ID
else:
# Just copy the contents...
result += line
sourceFile.close()
return result
To create a PyYAML object from a Unity textual scene file, call your pre-parser function on the file:
import yaml
# This fixes Unity's YAML %TAG alias issue.
fileToLoad = '/Users/vlad.dumitrascu/<SOME_PROJECT>/Client/Assets/Gear/MeleeWeapons/SomeAsset_test.prefab'
UnityStreamNoTags = removeUnityTagAlias(fileToLoad)
ListOfNodes = list()
for data in yaml.load_all(UnityStreamNoTags):
ListOfNodes.append( data )
# Example, print each object's name and type
for node in ListOfNodes:
if 'm_Name' in node[ node.keys()[0] ]:
print( 'Name: ' + node[ node.keys()[0] ]['m_Name'] + ' NodeType: ' + node.keys()[0] )
else:
print( 'Name: ' + 'No Name Attribute' + ' NodeType: ' + node.keys()[0] )
Hope that helps!
-Vlad
PS. To Answer the next issue in making this usable:
You also need to walk the entire project directory and parse all ".meta" files for the "GUID", which is Unity's inter-file reference. So, when you see a reference in a Unity YAML file for something like:
m_Materials:
- {fileID: 2100000, guid: 4b191c3a6f88640689fc5ea3ec5bf3a3, type: 2}
That file is somewhere else. And you can re-cursively open that one to find out any dependencies.
I just ripped through the game project and saved a dictionary of GUID:Filepath Key:Value pairs which I can match against.

Specifying anchor names in reST

I'm creating HTML from reST using the rst2html tool which comes with docutils. It seems that the code already assigns id attributes to the individual sections, which can be used as fragment identifiers in a URL, i.e. as anchors to jump to a specific part of the page. Those id values are based on the text of the section headline. When I change the wording of that headline, the identifier will change as well, rendering old URLs invalid.
Is there a way to specify the name to use as an identifier for a given section, so that I can edit the headline without invalidating links? Would there be a way if I were to call the docutils publisher myself, from my own script?

I don't think you can set an explicit id in reST sections, but I could be mistaken.
If you'd rather have numbered ids, which will depend on the ordering of the sections in the document tree, rather than their titles, you can do it with a small change to document.set_id() method in docutils/nodes.py (at line 997 on my version.)
Here is the patch:
def set_id(self, node, msgnode=None):
for id in node['ids']:
if id in self.ids and self.ids[id] is not node:
msg = self.reporter.severe('Duplicate ID: "%s".' % id)
if msgnode != None:
msgnode += msg
if not node['ids']:
- for name in node['names']:
- id = self.settings.id_prefix + make_id(name)
- if id and id not in self.ids:
- break
- else:
+ if True: #forcing numeric ids
id = ''
while not id or id in self.ids:
id = (self.settings.id_prefix +
self.settings.auto_id_prefix + str(self.id_start))
self.id_start += 1
node['ids'].append(id)
self.ids[id] = node
return id
I just tested it and it generates the section ids as id1, id2...
If you don't want to change this system-wide file, you can probably monkey-patch it from a custom rst2html command.

I'm not sure if I really understand your question.
You can create explicit hyperlink targets to arbitrary locations in your document which can be used to reference these locations independent of the implicit hyperlink targets created by docutils:
.. _my_rstfile:
------------------
This is my rstfile
------------------
.. _a-section:
First Chapter
-------------
This a link to a-section_ which is located in my_rstfile_.
As it seems that you want to create links between multiple rst files I would however advise to use Sphinx as it can handle references to arbitrary locations between different files and has some more advantages, like a toctree and theming. You can use sphinx not only for source code documentation, but for general text processing. Something like an example is the Sphinx documentation itself (there are hundreds of other examples on readthedocs).
Invoking Sphinx should be simple using sphinx-quickstart. You can simply add your exiting rst-files to the toctree in index.rst and run make html. If you want to document python code you can use sphinx-apidoc which will automatically generate an API documentation.

I made a Sphinx extension to solve this problem. The extension takes the preceding internal target and uses that as the section's ID. Example (from bmu's answer):
.. _a-section:
First Chapter
-------------
The permalink on "First Chapter" would point to #a-section instead of #first-chapter. If there's multiple, it'll take the last one.
Link to the extension: https://github.com/GeeTransit/sphinx-better-subsection

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Exclude header comments when generating documentation with sphinx - python

Related

Python comment-preserving parsing using only builtin libraries?

Snippets vs. Abbreviations in Vim

What is this structured comment and what is "updated" field for?

PyYAML and unusual tags

Specifying anchor names in reST

Categories

Resources