Making Sphinx produce untypogrified code blocks in PDF output

Making Sphinx produce untypogrified code blocks in PDF output - python

Code blocks look ugly (check the quotes) in PDF output:
I use version 1.1.3 of Sphinx, and used the following command to produce the doc:
$ make latexpdf
Also, copying the snippet from the PDF destroys the indenting when pasting:
#view_config(route_name=’hello’)
def hello_world(request):
return Response(’Hello World!’)
I would expect this:
#view_config(route_name=’hello’)
def hello_world(request):
return Response(’Hello World!’)
This would be even nicer:
#view_config(route_name='hello')
def hello_world(request):
return Response('Hello World!')

Sphinx is really an excellent tool but I also have a few issues with the default PDF output of the latexpdf target.
Specifically:
Single quotes in code blocks are converted to acute style quotes which doesn't look right in source code.
The code blocks aren't indented from the main text. For me this makes them less readable.
I prefer other fonts and pygments but that is just a personal choice and can be configured.
Some of this can be fixed in the LaTeX pre-amble section of the Sphinx conf.py but the quotes are modified by Sphinx to custom LaTeX entities so the upquote LaTeX package can't be used to correct them.
After a good bit of experimentation with different config options I ended up writing a small script to modify the LaTeX source prior to building the PDF. The script is here and the output that I wanted to generate is here. (For comparison here is the default output for the same document.)
If someone has a cleaner solution, for example one that could be done completely through Sphinx conf.py so the changes would be picked up by ReadTheDocs then I would be interested.
Also, the issue with losing indentation when copying and pasting from the PDF probably isn't a Sphinx/LaTeX issue.

This is only a partial answer which may lead toward an ultimate solution. To disable typographer quotes (also known as curly or smart quotes) for HTML output in Sphinx, change the default setting in conf.py for SmartyPants from True to False.
I assume that one could find the function in Sphinx that transforms quotes and use the same logic from the HTML output and apply it for PDF output.

The option proposed by Steve Piercy now no longer works (since Sphinx 1.6):
Deprecated since version 1.6: Use the smart_quotes option
in the Docutils configuration file (docutils.conf) instead.
What was not obvious to me was how to apply this setting. You need to create a docutils.conf file and put it in one of these locations:
/etc/docutils.conf
./docutils.conf
~/.docutils
The easiest option is to put it where you are building the docs from - YMMV. It should contain at least this:
[general]
smart_quotes: no

Related

How to auto wrap the VSCode python intelliSence provided by Pylance?

PyLance response:
Not converting to markdown wouldn't be a good idea (as it prevents us from using markdown at all in the tooltips). VS Code's plaintext support is broken until 1.52, but maybe then we could add a toggle to say "use plaintext only".
I'm using VSCode to write python, and using Pylance to provide intelliSence. I wonder if it can provide auto wrapped information in a neat way? The intellisense information currently provided mixed all things in a single line which makes it hard to see.
More specifically, the figure below shows the intellisense without auto wrap. I would like the Args: and the following information shows exactly as the green comment writes (each line is one parameter with its explanation). How can I achieve it?
figure 1: intellisense of a user defined class FDNN without auto wrap
figure 2: intellisense of a pytorch class nn.Linear without auto wrap
EDIT 1:
removing the r before comment doesn't work
EDIT 2:
adding - renders better than plain text, but face with _ escape problem.

The solution is simple: Remove the r in front of your docstring
Edit
I have tried the r with a doc string for a function but I can't reproduce the behavior.
If you format the doc string as a kind of Markdown it will display better only it has problems with _ in variable names.
Underline header lines with - (minus) and the text is rendered reasonable.
class FDNN:
"""
Applies a fused fuzzy .....
Args
----
input_size: size of input vector
memfcn: type of membership functions
memparalist: list of tuples of membership functions
"""
def __init__(self):
pass
For functions the rendering of the doc string after you type the opening ( is different, it is used as literal text in the arguments tooltip.
This might be a reason to create an issue for VSC. The descriptions in the different Providers are interpreted differently and should be possible to mark them as plain text or "Markdown"

How to let pyflakes ignore some errors?

I am using SublimePythonIDE which is using pyflakes.
There are some errors that I would like it to ignore like:
(E501) line too long
(E101) indentation contains mixed spaces and tabs
What is the easiest way to do that?

Configuring a plugin in Sublime almost always uses the same procedure: Click on Preferences -> Package Settings -> Plugin Name -> Settings-Default to open the (surprise surprise) default settings. This file generally contains all the possible settings for the plugin, usually along with comments explaining what each one does. This file cannot be modified, so to customize any settings you open Preferences -> Package Settings -> Plugin Name -> Settings-User. I usually copy the entire contents of the default settings into the user file, then customize as desired, then save and close.
In the case of this particular plugin, while it does use pyflakes (as advertised), it also makes use of pep8, a style checker that makes use of the very same PEP-8 official Python style guide I mentioned in the comments. This knowledge is useful because pyflakes does not make use of specific error codes, while pep8 does.
So, upon examination of the plugin's settings file, we find a "pep8_ignore" option as well as a "pyflakes_ignore" one. Since the error codes are coming from pep8, we'll use that setting:
"pep8_ignore": [ "E501", // line too long
"E303", // too many blank lines (3)
"E402" // module level import not at top of file
]
Please note that codes E121, E123, E126, E133, E226, E241, E242, and E704 are ignored by default because they are not rules unanimously accepted, and PEP 8 does not enforce them.
Regarding long lines:
Sometimes, long lines are unavoidable. PEP-8's recommendation of 79-character lines is based in ancient history when terminal monitors only had 80 character-wide screens, but it continues to this day for several reasons: it's backwards-compatible with old code, some equipment is still being used with those limitations, it looks good, it makes it easier on wider displays to have multiple files open side-by-side, and it is readable (something that you should always be keeping in mind when coding). If you prefer to have a 90- or 100-character limit, that's fine (if your team/project agrees with it), but use it consistently, and be aware that others may use different values. If you'd like to set pep8 to a larger value than its default of 80, just modify the "pep8_max_line_length" setting.
There are many ways to either decrease the character count of lines to stay within the limit, or split long lines into multiple shorter ones. In the case of your example in the comments:
flag, message = FacebookUserController.AddFBUserToDB(iOSUserId, fburl, fbsecret, code)
you can do a couple of things:
# shorten the module/class name
fbuc = FacebookUserController
# or
import FacebookUserController as fbuc
flag, message = fbuc.AddFBUserToDB(iOSUserId, fburl, fbsecret, code)
# or eliminate it all together
from FacebookUserController import AddFBUserToDB
flag, message = AddFBUserToDB(iOSUserId, fburl, fbsecret, code)
# split the function's arguments onto separate lines
flag, message = FacebookUserController.AddFBUserToDB(iOSUserId,
fburl,
fbsecret,
code)
# There are multiple ways of doing this, just make sure the subsequent
# line(s) are indented. You don't need to escape newlines inside of
# braces, brackets, and parentheses, but you do need to outside of them.

As others suggest, possibly heed the warnings. But in those cases where you can't, you can add # NOQA to the end offending lines. Note the two spaces before the # as that too is a style thing that will be complained about.
And if pyflakes is wrapped in flake8 that allows ignoring by specific errors.
For example in a file in the project put or add to tox.ini:
[flake8]
exclude = .tox,./build
filename = *.py
ignore = E501,E101
This is possibly a duplicate with How do I get Pyflakes to ignore a statement?

How to interpret "description" as restructured text in sphinx argparse or autoprogram?

I have been using sphinx argparse and sphinx autoprogramm modules to scrape command-line descriptors from python script using the argparse module. The output is generally fine but the "description" part of the script is being parsed as a single paragraph of text. Is there some way of throwing this through a reST interpreter or something like that so it at least preserves the whitespace bewteen paragraphs?

Looks like this module is under development. I'd suggest looking at the github repository, and maybe raising an issue.
https://github.com/ribozz/sphinx-argparse
In sphinarg/ext.py, the description is formatted with docutils.nodes.paragraph. Same with the epilog. usage on the other hand usesnodes.literal_block.
================
After playing around with docutils I suspect the description is entered into the doctree as
<paragraph>
Fancy *argparse* description
...
This is an attempt to use fancier formatting....
</paragraph>
which ends up in the html as
<p>
Fancy *argparse* description
...
</p>
It retains all the original whitespace, but the browser renders it as single wrapped paragraph block.
To preserve whitespace, and to act on things like emphasis and bullets, it needs to be passed through a reader and/or parser. Then its portion of the doctree will look more like:
<paragraph>Fancy <emphasis>argparse</emphasis> description</paragraph>...
<paragraph>This is an attempt to use fancier formatting. ....</paragraph>
I can do this in a standalone script with:
docutils.core.publish_doctree(description)
but I don't how it could be done in sphinx-argparse.
In effect, sphinx-argparse treats the description as a simple paragraph, in the same spirit as the default HelpFormatter.

Programmatically converting/parsing LaTeX code to plain text

I have a couple of code projects in C++/Python in which LaTeX-format descriptions and labels are used to generate PDF documentation or graphs made using LaTeX+pstricks. However, we also have some plain text outputs, such as an HTML version of the documentation (I already have code to write minimal markup for that) and a non-TeX-enabled plot renderer.
For these I would like to eliminate the TeX markup that is necessary for e.g. representing physical units. This includes non-breaking (thin) spaces, \text, \mathrm etc. It would also be nice to parse down things like \frac{#1}{#2} into #1/#2 for the plain text output (and use MathJax for the HTML). Due to the system that we've got at the moment, I need to be able to do this from Python, i.e. ideally I'm looking for a Python package, but a non-Python executable which I can call from Python and catch the output string would also be fine.
I'm aware of the similar question on the TeX StackExchange site, but there weren't any really programmatic solutions to that: I've looked at detex, plasTeX and pytex, which they all seem a bit dead and don't really do what I need: programmatic conversion of a TeX string to a representative plain text string.
I could try writing a basic TeX parser using e.g. pyparsing, but a) that might be pitfall-laden and help would be appreciated and b) surely someone has tried that before, or knows of a way to hook into TeX itself to get a better result?
Update: Thanks for all the answers... it does indeed seem to be a bit of an awkward request! I can make do with less than general parsing of LaTeX, but the reason for considering a parser rather than a load of regexes in a loop is that I want to be able to handle nested macros and multi-arg macros nicely, and get the brace matching to work properly. Then I can e.g. reduce txt-irrelevant macros like \text and \mathrm first, and handle txt-relevant ones like \frac last... maybe even with appropriate parentheses! Well, I can dream... for now regexes are not doing such a terrible job.

I understand this is an old post, but since this post comes up often in latex-python-parsing searches (as evident by Extract only body text from arXiv articles formatted as .tex), leaving this here for folks down the line: Here's a LaTeX parser in Python that supports search over and modification of the parse tree, https://github.com/alvinwan/texsoup. Taken from the README, here is sample text and how you can interact with it via TexSoup.
from TexSoup import TexSoup
soup = TexSoup("""
\begin{document}
\section{Hello \textit{world}.}
\subsection{Watermelon}
(n.) A sacred fruit. Also known as:
\begin{itemize}
\item red lemon
\item life
\end{itemize}
Here is the prevalence of each synonym.
\begin{tabular}{c c}
red lemon & uncommon \\
life & common
\end{tabular}
\end{document}
""")
Here's how to navigate the parse tree.
>>> soup.section # grabs the first `section`
\section{Hello \textit{world}.}
>>> soup.section.name
'section'
>>> soup.section.string
'Hello \\textit{world}.'
>>> soup.section.parent.name
'document'
>>> soup.tabular
\begin{tabular}{c c}
red lemon & uncommon \\
life & common
\end{tabular}
>>> soup.tabular.args[0]
'c c'
>>> soup.item
\item red lemon
>>> list(soup.find_all('item'))
[\item red lemon, \item life]
Disclaimer: I wrote this lib, but it was for similar reasons. Regarding the post by Little Bobby Tales (regarding def), TexSoup doesn't handle definitions.

A word of caution: It is much more difficult to write a complete parser for plain TeX than what you might think. The TeX-level (not LaTeX) \def command actually extends TeX's syntax. For example, \def\foo #1.{{\bf #1}} will expand \foo goo. into goo - Notice that the dot became a delimiter for the foo macro! Therefore, if you have to deal with any form of TeX, without restrictions on which packages may be used, it is not recommended to rely on simple parsing. You need TeX rendering. catdvi is what I use, although it is not perfect.

Try detex (shipped with most *TeX distributions), or the improved version: http://code.google.com/p/opendetex/
Edit: oh, I see you tried detex already. Still, opendetex might work for you.

I would try pandoc [enter link description here][1]. It is written in Haskell, but it is a really nice latex 2 whatever converter.
[1]: http://johnmacfarlane.net/pandoc/index.html .

As you're considering using TeX itself for doing the rendering, I suspect that performance is not an issue. In this case you've got a couple of options: dvi2txt to fetch your text from a single dvi file (be prepared to generate one for each label) or even rendering dvi into raster images, if it's ok for you - that's how hevea or latex2html treats formulas.

Necroing this old thread, but found this nifty library called pylatexenc that seems to do almost exactly what the OP was after:
from pylatexenc.latex2text import LatexNodes2Text
LatexNodes2Text().latex_to_text(r"""\
\section{Euler}
\emph{This} bit is \textbf{very} clever:
\begin{equation}
\mathrm{e}^{i \pi} + 1 = 0 % wow!!
\end{equation}
where
\[
\mathrm{e} = \lim_{n \to \infty} \left(1 + \frac{1}{n}\right)^n
\]
""")
which produces
§ EULER
This bit is very clever:
e^i π + 1 = 0
where
e = lim_n →∞(1 + 1/n)^n
As you can see, the result is not perfect for the equations, but it does a great job of stripping and converting all the tex commands.

Building the other post Eduardo Leoni, I was looking at pandoc and I see that it comes with a standalone executable but also on this page it promises a way to build to a C-callable system library. Perhaps this is something that you can live with?

LaTeX-format descriptions and labels are used to generate PDF documentation or graphs made using LaTeX+pstricks
This is your mistake. You shouldn't have done that.
Use RST or some other -- better -- markup language.
Use Docutils to create LaTeX and HTML from the RST source.

How do you enable block folding for Python comments in TextMate?

In TextMate 1.5.10 r1623, you get little arrows that allow you to fold method blocks:
Unfortunately, if you have a multi-lined Python comment, it doesn't recognize it, so you can't fold it:
def foo():
"""
How do
I fold
these comments?
"""
print "bar"
TextMate has this on their site on how to customize folding: http://manual.macromates.com/en/navigation_overview#customizing_foldings
...but I'm not skilled in regex enough to do anything about it. TextMate uses the Oniguruma regex API, and I'm using the default Python.tmbundle updated to the newest version via GetBundles.
Does anyone have an idea of how to do this? Thanks in advance for your help! :)
Adding the default foldingStartMarker and foldingStopMarker regex values for Python.tmbundle under the Python Language in Bundle Editor:
foldingStartMarker = '^\s*(def|class)\s+([.a-zA-Z0-9_ <]+)\s*(\((.*)\))?\s*:|\{\s*$|\(\s*$|\[\s*$|^\s*"""(?=.)(?!.*""")';
foldingStopMarker = '^\s*$|^\s*\}|^\s*\]|^\s*\)|^\s*"""\s*$';

It appears that multi-line comment folding does work in TextMate, but your must line up your quotes exactly like so:
""" Some sort of multi
line comment, which needs quotes
in just the right places to work. """
That seems to do it:

According to this Textmate Mailing list thread, if you follow it to the end, proper code folding for Python is not supported. Basically, regular expressions as implemented in the foldingStartMarker and foldingStopMarker do not allow for captures, thus the amount of spacing at the beginning of the "end fold" cannot be matched to the "begin fold".
The issue is not finally and officially addressed by Textmate's creator, Allan Odgaard; however since the thread is from 2005, I assume it is a dead issue, and not one that will be supported.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.