Sphinx extension: literal block with leading and/or trailing blank lines? - python

As far as I can tell, it is not possible to create a literal text block (e.g. with the code-block directive) that starts or ends with a blank line, because this would be ambiguous with regard to the reStructuredText syntax.
That's OK.
But now I want to create a custom directive that uses docutils's literal_block() node, and I want (within the code of my directive) to add empty lines at the beginning and/or end of the directive's contents.
Since this isn't possible in reStructuredText syntax, I'm planning to use the directive's options to specify the number of blank lines, but that's not my problem and not part of my question. Just in case you're wondering ...
Here's a minimal example of what I want to do:
import docutils
class MyDirective(docutils.parsers.rst.Directive):
has_content = True
def run(self):
text = '\n\n' + '\n'.join(self.content.data) + '\n\n'
node = docutils.nodes.literal_block(text, text)
print(node)
return [node]
def setup(app):
app.add_directive('mydirective', MyDirective)
It can be used like this:
.. mydirective::
Hello, world!
This works, but the newlines I added in the directive are somehow swallowed by Sphinx (in both HTML and LaTeX output).
How can I avoid that?
The newlines are actually stored in the node object (as can be seen in the output of print()), but they seem to get lost somewhere later during Sphinx processing.
I don't know enough about the Sphinx machinery to track this down on my own, any help would be very much appreciated!

I would rather try with CSS margin-top and margin-bottom properties.

I found an answer to my own question, but it is far more complicated then I hoped ...
I created a custom node class and added a literal_block instance as a child node.
I'm saving the number of empty lines as attributes of the custom node class.
Then I created "visit" and "depart" functions (actually only the latter) for HTML and LaTeX that take the numbers from the node attributes and do some un-elegant string replacement on self.body fumbling the newlines into place.
This works fine for both HTML and LaTeX but I'd be happy to hear about a more elegant solution!

Related

How to check for key value metadata in markdown

I need to check if my input, formatted using markdown, has key-value pair metadata at the beginning, and then insert text after the whole metadata block.
I look for a : in the first line and if found, split the input string at the first newline and add my stuff.
Now, if markdown_content.splitlines()[0].find(':') >= 0: obviously fails when there's no metadata at the beginning, but something else containing a :instead.
Examples
Input with metadata:
page title: fancypagetitle
something else: another value
# Heading
Text
Input without metadata, but with a :
This is a [link](http://www.stackoverflow.com)
# Heading
Text
My question is: How do I check if a metadata block is present and in case it is, add something in between metadata and the remaining markdown.
Definition of metadata
The keywords are case-insensitive and may consist of letters, numbers, underscores and dashes and must end with a colon. The values consist of anything following the colon on the line and may even be blank.
If a line is indented by 4 or more spaces, that line is assumed to be an additional line of the value for the previous keyword. A keyword may have as many lines as desired.
The first blank line ends all meta-data for the document. Therefore, the first line of a document must not be blank. All meta-data is stripped from the document prior to any further processing by Markdown.
Source: https://pythonhosted.org/Markdown/extensions/meta_data.html
Have you considered looking at the source code for the meta data extension to see how it's done?
The regex used is:
META_RE = re.compile(r'^[ ]{0,3}(?P<key>[A-Za-z0-9_-]+):\s*(?P<value>.*)')
Of course there is also the regex for secondary lines:
META_MORE_RE = re.compile(r'^[ ]{4,}(?P<value>.*)')
If you note, those regular expressions are much more specific than yours and are much less likely to match a false positive. Then the extension splits the document into lines, loops through each line comparing with those regexs and breaks out of the loop on the first line that does not match (which may or may not be blank line).
If you notice in that code, there is a new feature that has been added which will be available in the next release. Support is being added for optional YAML style deliminators. If you are comfortable using the latest (unreleased) development code, you could wrap your meta data in YAML deliminators which might make it a little easier to find the end of the meta data.
For example, your example document above would then look like this (note I used the optional end specific deliminator (...) which more clearly marks the end):
---
page title: fancypagetitle
something else: another value
...
# Heading
Text
That said, you would still need to be careful that you didn't get a false match (a <hr> for example). I suppose either way you would really need to re-implement everything that is in the meta data extension for your own needs. Of course, it is open source, so you can as long as you honor the license.
Sorry, but I can't give you a timeline on when the next release will happen for sure.
Oh, and it may also help to look at the description of this feature provided by MultiMarkdown which inspired the feature in Python-Markdown. That might give you a clearer picture of what might comprise meta-data.

Python PEP: blank line after function definition?

I can't find any PEP reference to this detail. There has to be a blank line after function definition?
Should I do this:
def hello_function():
return 'hello'
or shoud I do this:
def hello_function():
return 'hello'
The same question applies when docstrings are used:
this:
def hello_function():
"""
Important function
"""
return 'hello'
or this
def hello_function():
"""
Important function
"""
return 'hello'
EDIT
This is what the PEP says on the blank lines, as commented by FoxMaSk, but it does not say anything on this detail.
Blank Lines
Separate top-level function and class definitions with two blank
lines.
Method definitions inside a class are separated by a single blank
line.
Extra blank lines may be used (sparingly) to separate groups of
related functions. Blank lines may be omitted between a bunch of
related one-liners (e.g. a set of dummy implementations).
Use blank lines in functions, sparingly, to indicate logical sections.
Python accepts the control-L (i.e. ^L) form feed character as
whitespace; Many tools treat these characters as page separators, so
you may use them to separate pages of related sections of your file.
Note, some editors and web-based code viewers may not recognize
control-L as a form feed and will show another glyph in its place.
Read Docstring Conventions.
It says that even if the function is really obvious you have to write a one-line docstring. And it says that:
There's no blank line either before or after the docstring.
So I would code something like
def hello_function():
"""Return 'hello' string."""
return 'hello'
As pointed out by #moliware, the Docstring Conventions state, under One-line Docstrings:
There's no blank line either before or after the docstring.
HOWEVER, it also says (under Multi-line Docstrings):
Insert a blank line after all docstrings (one-line or multi-line) that document a class -- generally speaking, the class's methods are separated from each other by a single blank line, and the docstring needs to be offset from the first method by a blank line.
My interpretation of all this: blank lines should never precede any docstring, and should only follow a docstring when it is for a class.
Projects use different docstring conventions.
For example, the pandas docstring guide explicitly requires you to put triple quotes into a line of their own.
Docstrings must be defined with three double-quotes. No blank lines should be left before or after the docstring. The text starts in the next line after the opening quotes. The closing quotes have their own line (meaning that they are not at the end of the last sentence).
Making a python script simultaneously adhere to pydocstyle and pycodestyle is a challenge. But one thing which greatly helps is that in your docstring write the first line as summary of the function or class within 79 characters including ..This way you adhere to both PEP 257 (as per pydocstyle) of having a period at the end of an unbroken line and 79 characters limit of PEP 8 (as per pycodestyle).
Then after leaving one blank line (for that using new line shortcut of your coditor is better than manually pressing enter) you can write whatever you want and at that time focusing only on pycodestyle which is slightly easier than pydocstyle and the main reason is that our understanding of line and indentation is quite different than what system understands due to indentation settings, tab settings, line settings in the various code editors we use.So in this way you will have TODO from pycodestyle which you understand and can rectify instead of banging your head against the wall on pydocstyle TODOs.

Documentation after members in python (with doxygen)

I am using doxygen and have the following code:
def __init__(self):
'''
'''
if not '_ready' in dir(self) or not self._ready:
self._stream = sys.stderr ##!< stream to which all output is written
self._ready = True ##!< #internal Flag to check initialization of singelton
For some reason doxygen tells me that self._stream (Member _stream) is undocumented. can I document it with a comment, like the doxygen docu describes in Putting documentation after members and if so, whats the right way?
**edit:**this seems to be related to me having no new line, for example here:
class escapeMode(object):
'''
Enum to represent the escape mode.
'''
ALWAYS = 1 ##!< Escape all values
NECESSARY = 2 ##!< Escape only values containing seperators or starting with quotation
Doxygen only complains about ALWAYS being undocumented, I would like to avoid inserting newlines behind every new attribute I document this way since it destroys the value of newlines for separating logical blocks like loops or if statements from surrounding code
This is currently not supported in doxygen, as previously answered here.
If you put the comment on the preceeding line it will work fine:
class escapeMode(object):
'''
Enum to represent the escape mode.
'''
## Escape all values
ALLWAYS = 1
## Escape only values containing seperators or starting with quotation
NECESSARY = 2
Hope that's not too late...

comment out nested triple quotes

In python to comment-out multiple lines we use triple quotes
def x():
"""This code will
add 1 and 1 """
a=1+1
but what if I have to comment out a block of code which already contains lot of other comment out blocks (triple quote comments). For example if I want to comment out this function fully..
"""
def x():
"""This code will
add 1 and 1 """
a=1+1
"""
This doesn't work. How can I comment out such blocks of code.
In python to comment-out multiple lines we use triple commas
That’s just one way of doing it, and you’re technically using a string literal, not a comment. And, although it has become fairly established, this way of writing comments has the drawback you observed: you cannot comment out nested blocks.1
Python doesn’t have nesting multiline comments, it’s as simple as that. If you want to comment out multiple lines allowing for nested comments, the only safe choice is to comment out each line.
Most editors have some command that makes commenting out or in multiple lines easy.
1 For a single level of nesting you can in fact use '''"""nested """''', or the other way round. But I wouldn’t recommend it.
What I often do in brief hack&slay situations is something like this below. It is not really a comment, and it does not cover all cases (because you need to have a block), but maybe it is helpful:
if 0: # disabled because *some convincing reason*
def x():
"""This code will
add 1 and 1 """
a=1+1
Or, if you cannot or don't like to introduce indenting levels between the typical ones:
# disabled because *some convincing reason*
if 0: # def x():
"""This code will
add 1 and 1 """
a=1+1
You should use # for commenting, and at the beginning of each line. This is very easy if you're using eclipse + pydev.
Simply select the block of code to comment, and press Ctrl + \. The same goes for uncommentng as well.
I'm sure there are such easy ways in other editors as well.
I'm taking a Udacity python programming course building a search engine. They use the triple quotes to enclose a webpage's source code as a string in the variable 'page' to be searched for all the links.
page = '''web page source code''' that is searched with a page.find()

Python XML: write " instead of &quot

I am using Python's xml minidom and all works well except that in text sequences it writes out &quot escape characters instead of ". This of course makes sense if a quote appears in a tag, but it bugs me in the text. How do I change this?
looking at the source (Python 3.2 if it matters), this is hardcoded in the _write_data() function. you would need to modify the writexml() method of TextNode - either by subclassing it or simply editing it - so that it didn't call that method, but instead did something similar to escape only < and >.
if you created a subclass outside of the package (instead of copying and hacking the package to make your own custom xmlminidom) then it looks like, with a little care, you could make things work. so you would create your own (subclass of) TextNode, modified as above and then, to add text to the DOM, you would add an instance of your new class (or replace existing text nodes with instances of that class). you would need to set the ownerDocument attribute. perhaps simplest would be to also subclass Document and fix the createTextNode() method.
but i don't see a simpler way of doing what you want. it might be best to use a better dom implementation.
ps i have no idea whether this behaviour is required by the xml spec, or not. update: a quick scan of http://www.w3.org/TR/2008/REC-xml-20081126/#syntax suggests that only < and & must be encoded.

Categories