Python Markdown messed up math formula

Python Markdown messed up math formula - python

When converting the content from markdown to HTML format, the following math function
${\{y_{i}, x_{i1}, ..., x_{id}\}}_{i=1}^{n}$
got converted to
${{y_{i}, x_{i1}, ..., x_{id}}}<em>{i=1}^{n}$
Is there a way to tell Python's Markdown not doing this? Thanks!

Yes, the math syntax conflicts with Markdown's syntax. You need to tell the parser to not treat that as Markdown.
You have a few different ways to do that:
Wrap the Math function in raw HTML. Markdown is not processed inside raw block-level HTML. So wrap your math in a div:
<div>${\{y_{i}, x_{i1}, ..., x_{id}\}}_{i=1}^{n}$</div>
Use a third-party extension (also here) which offers a way to mark up a section of a document as Math. You need to install one of the extensions and then tell Python-Markdown to use it.For example, to install Python-Markdown-Math, do the following:
pip install python-markdown-math
Then, in your Python code, you can tell Markdown to use the extension:
markdown.markdown(src, extensions=['mdx-math'])
Note that in order to avoid conflicts with dollar signs used for for designating money (normal usage), most of the extensions do not support using single dollar sign as a wrapper. In that case, you will need to use double-dollar signs ($$...$$). Or you could turn on support for single dollar signs like this:
markdown.markdown(
src,
extensions=['mdx_math'],
extension_configs={
'mdx_math': {'enable_dollar_delimiter': True}
}
)
As an aside, the Arithmatex Extension offers support for single dollar signs by default. However, it is part of a larger package of extensions. If you only need the one, Python-Markdown-Math will serve you just fine (each is developed by a different member of the Python-Markdown team).

Related

How to auto wrap the VSCode python intelliSence provided by Pylance?

PyLance response:
Not converting to markdown wouldn't be a good idea (as it prevents us from using markdown at all in the tooltips). VS Code's plaintext support is broken until 1.52, but maybe then we could add a toggle to say "use plaintext only".
I'm using VSCode to write python, and using Pylance to provide intelliSence. I wonder if it can provide auto wrapped information in a neat way? The intellisense information currently provided mixed all things in a single line which makes it hard to see.
More specifically, the figure below shows the intellisense without auto wrap. I would like the Args: and the following information shows exactly as the green comment writes (each line is one parameter with its explanation). How can I achieve it?
figure 1: intellisense of a user defined class FDNN without auto wrap
figure 2: intellisense of a pytorch class nn.Linear without auto wrap
EDIT 1:
removing the r before comment doesn't work
EDIT 2:
adding - renders better than plain text, but face with _ escape problem.

The solution is simple: Remove the r in front of your docstring
Edit
I have tried the r with a doc string for a function but I can't reproduce the behavior.
If you format the doc string as a kind of Markdown it will display better only it has problems with _ in variable names.
Underline header lines with - (minus) and the text is rendered reasonable.
class FDNN:
"""
Applies a fused fuzzy .....
Args
----
input_size: size of input vector
memfcn: type of membership functions
memparalist: list of tuples of membership functions
"""
def __init__(self):
pass
For functions the rendering of the doc string after you type the opening ( is different, it is used as literal text in the arguments tooltip.
This might be a reason to create an issue for VSC. The descriptions in the different Providers are interpreted differently and should be possible to mark them as plain text or "Markdown"

common lisp equivalent of a python idiom

How do I run an equivalent of this Python command in Lisp
from lib import func
For example, I want to use the split-sequence package, and in particular I only want the split-sequence method from that package.
Currently, I have to use it as (split-sequence:split-sequence #\Space "this is a string").
But I what I want to do is (split-sequence #\Space "this is a string").
How do I get access to the function directly without qualifying it with the package name?

What you want to do is simply:
(import 'split-sequence:split-sequence)
This works fine in a REPL, but if you want to organize your symbols, you'd better rely on packages.
(defpackage #:my-package
(:use #:cl)
(:import-from #:split-sequence
#:split-sequence))
The first ̀split-sequence is the package, followed by all the symbols that should be imported. In DEFPACKAGE forms, people generally use either keywords or uninterned symbols like above in order to avoid interning symbols in the current package. Alternatively, you could use strings, because only the names of symbols are important:
(defpackage "MY-PACKAGE"
(:use "CL")
(:import-from "SPLIT-SEQUENCE" "SPLIT-SEQUENCE"))

Python markdown edge case: /* */

Is there a way to get Python to interpret Markdown the same way as it is interpreted here on stackoverflow:
This is a C comment: /* */ tada!
and on github? https://gist.github.com/jason-s/fc81280dc6108f9ec3a8
Python's markdown module interprets the * * as italics:
>>> import markdown
>>> markdown.markdown('This is a C comment: /* */ tada!')
u'<p>This is a C comment: /<em> </em>/ tada!</p>'
(Babelmark 2 shows some of the differences. Looks like there are different interpretations of the markdown syntax.)

The /* */ syntax is not standard Markdown. In fact, it is not mentioned at all in the syntax rules. Therefore, it is less likely to be handled consistently among different Markdown implementations.
If it is a C comment, then it is "code" and should probably be marked up as such. Either in a code block or using inline code backticks (`/* */`). As mentioned in a comment to the OP, it could also be escaped with backslashes if you really don't want it marked up as code. Personally, I would instruct the author to fix their documents (regardless of parser behavior).
In fact, the Markdown parsers that do ignore it do so by accident. In an effort to avoid matching a few edge cases that should not be interpreted as emphasis, they require a word boundary before the opening asterisk (but not after it) and a word boundary after the closing asterisk (but not before it) to consider is as emphasis. Because the C comment has a slash before the opening asterisk (and a space after it) and a slash after the closing asterisk (and a space before it), some parsers do not see it as emphasis. I suspect you will find that those same parsers fail to identify a few edge cases as emphasis that should be. And as the Syntax Rules are silent on these edge cases, each implementation gets them slightly different. I would even go so far as to say that the implementations that do not see that as emphasis are potentially in the wrong here. But this is not the place to debate that.
That said, you are using Python-Markdown, which has a comprehensive Extension API. If an existing third party extension does not already exist (see below), you can create your own. You may add your own pattern to match the C comment specifically and handle it however you like. Or you may override the parser's default handling of emphasis and make it match some other implementation who's behavior you desire.
Actually, the BetterEm Extension (which, for some reason is not on the list of third party extensions) might do the later and give you the behavior you want. Unfortunately, it does not ship by itself, but as part of a larger package which includes multiple extensions. Of course, you only need to use the one. To get it working you need to install it. Unfortunately, it does not appear to be hosted on PyPI, so you'll have to download it directly from GitHub. The following command should download and install it all in one go:
pip install https://github.com/facelessuser/pymdown-extensions/archive/master.zip
Once you have successfully installed it just do the following in Python:
>>> import markdown
>>> html = markdown.markdown(yourtext, extensions=['pymdownx.betterem'])
Or from the commandline:
python -m markdown -x 'pymdownx.betterem' yourinputfile.md > output.html
Note: I have not tested the BetterEm Extension. It may or may not give you the behavior you want. According to the docs, "the behavior will be very similar in feel to GFM bold and italic (but not necessarily exact)."

well, it's ugly, but this post-Markdown-processing patch seems to do what I want.
>>> import markdown
>>> mdtext = 'This is a C comment: /* */ tada!'
>>> mdhtml = markdown.markdown(mdtext)
>>> mdhtml
u'<p>This is a C comment: /<em> </em>/ tada!</p>'
>>> import re
>>> mdccommentre = re.compile('/<em>( | .* )</em>/')
>>> mdccommentre.sub('/*\\1*/',mdhtml)
u'<p>This is a C comment: /* */ tada!</p>'

Programmatically converting/parsing LaTeX code to plain text

I have a couple of code projects in C++/Python in which LaTeX-format descriptions and labels are used to generate PDF documentation or graphs made using LaTeX+pstricks. However, we also have some plain text outputs, such as an HTML version of the documentation (I already have code to write minimal markup for that) and a non-TeX-enabled plot renderer.
For these I would like to eliminate the TeX markup that is necessary for e.g. representing physical units. This includes non-breaking (thin) spaces, \text, \mathrm etc. It would also be nice to parse down things like \frac{#1}{#2} into #1/#2 for the plain text output (and use MathJax for the HTML). Due to the system that we've got at the moment, I need to be able to do this from Python, i.e. ideally I'm looking for a Python package, but a non-Python executable which I can call from Python and catch the output string would also be fine.
I'm aware of the similar question on the TeX StackExchange site, but there weren't any really programmatic solutions to that: I've looked at detex, plasTeX and pytex, which they all seem a bit dead and don't really do what I need: programmatic conversion of a TeX string to a representative plain text string.
I could try writing a basic TeX parser using e.g. pyparsing, but a) that might be pitfall-laden and help would be appreciated and b) surely someone has tried that before, or knows of a way to hook into TeX itself to get a better result?
Update: Thanks for all the answers... it does indeed seem to be a bit of an awkward request! I can make do with less than general parsing of LaTeX, but the reason for considering a parser rather than a load of regexes in a loop is that I want to be able to handle nested macros and multi-arg macros nicely, and get the brace matching to work properly. Then I can e.g. reduce txt-irrelevant macros like \text and \mathrm first, and handle txt-relevant ones like \frac last... maybe even with appropriate parentheses! Well, I can dream... for now regexes are not doing such a terrible job.

I understand this is an old post, but since this post comes up often in latex-python-parsing searches (as evident by Extract only body text from arXiv articles formatted as .tex), leaving this here for folks down the line: Here's a LaTeX parser in Python that supports search over and modification of the parse tree, https://github.com/alvinwan/texsoup. Taken from the README, here is sample text and how you can interact with it via TexSoup.
from TexSoup import TexSoup
soup = TexSoup("""
\begin{document}
\section{Hello \textit{world}.}
\subsection{Watermelon}
(n.) A sacred fruit. Also known as:
\begin{itemize}
\item red lemon
\item life
\end{itemize}
Here is the prevalence of each synonym.
\begin{tabular}{c c}
red lemon & uncommon \\
life & common
\end{tabular}
\end{document}
""")
Here's how to navigate the parse tree.
>>> soup.section # grabs the first `section`
\section{Hello \textit{world}.}
>>> soup.section.name
'section'
>>> soup.section.string
'Hello \\textit{world}.'
>>> soup.section.parent.name
'document'
>>> soup.tabular
\begin{tabular}{c c}
red lemon & uncommon \\
life & common
\end{tabular}
>>> soup.tabular.args[0]
'c c'
>>> soup.item
\item red lemon
>>> list(soup.find_all('item'))
[\item red lemon, \item life]
Disclaimer: I wrote this lib, but it was for similar reasons. Regarding the post by Little Bobby Tales (regarding def), TexSoup doesn't handle definitions.

A word of caution: It is much more difficult to write a complete parser for plain TeX than what you might think. The TeX-level (not LaTeX) \def command actually extends TeX's syntax. For example, \def\foo #1.{{\bf #1}} will expand \foo goo. into goo - Notice that the dot became a delimiter for the foo macro! Therefore, if you have to deal with any form of TeX, without restrictions on which packages may be used, it is not recommended to rely on simple parsing. You need TeX rendering. catdvi is what I use, although it is not perfect.

Try detex (shipped with most *TeX distributions), or the improved version: http://code.google.com/p/opendetex/
Edit: oh, I see you tried detex already. Still, opendetex might work for you.

I would try pandoc [enter link description here][1]. It is written in Haskell, but it is a really nice latex 2 whatever converter.
[1]: http://johnmacfarlane.net/pandoc/index.html .

As you're considering using TeX itself for doing the rendering, I suspect that performance is not an issue. In this case you've got a couple of options: dvi2txt to fetch your text from a single dvi file (be prepared to generate one for each label) or even rendering dvi into raster images, if it's ok for you - that's how hevea or latex2html treats formulas.

Necroing this old thread, but found this nifty library called pylatexenc that seems to do almost exactly what the OP was after:
from pylatexenc.latex2text import LatexNodes2Text
LatexNodes2Text().latex_to_text(r"""\
\section{Euler}
\emph{This} bit is \textbf{very} clever:
\begin{equation}
\mathrm{e}^{i \pi} + 1 = 0 % wow!!
\end{equation}
where
\[
\mathrm{e} = \lim_{n \to \infty} \left(1 + \frac{1}{n}\right)^n
\]
""")
which produces
§ EULER
This bit is very clever:
e^i π + 1 = 0
where
e = lim_n →∞(1 + 1/n)^n
As you can see, the result is not perfect for the equations, but it does a great job of stripping and converting all the tex commands.

Building the other post Eduardo Leoni, I was looking at pandoc and I see that it comes with a standalone executable but also on this page it promises a way to build to a C-callable system library. Perhaps this is something that you can live with?

LaTeX-format descriptions and labels are used to generate PDF documentation or graphs made using LaTeX+pstricks
This is your mistake. You shouldn't have done that.
Use RST or some other -- better -- markup language.
Use Docutils to create LaTeX and HTML from the RST source.

Common coding style for Python?

I'm pretty new to Python, and I want to develop my first serious open source project. I want to ask what is the common coding style for python projects. I'll put also what I'm doing right now.
1.- What is the most widely used column width? (the eternal question)
I'm currently sticking to 80 columns (and it's a pain!)
2.- What quotes to use? (I've seen everything and PEP 8 does not mention anything clear)
I'm using single quotes for everything but docstrings, which use triple double quotes.
3.- Where do I put my imports?
I'm putting them at file header in this order.
import sys
import -rest of python modules needed-
import whatever
import -rest of application modules-
<code here>
4.- Can I use "import whatever.function as blah"?
I saw some documents that disregard doing this.
5.- Tabs or spaces for indenting?
Currently using 4 spaces tabs.
6.- Variable naming style?
I'm using lowercase for everything but classes, which I put in camelCase.
Anything you would recommend?

PEP 8 is pretty much "the root" of all common style guides.
Google's Python style guide has some parts that are quite well thought of, but others are idiosyncratic (the two-space indents instead of the popular four-space ones, and the CamelCase style for functions and methods instead of the camel_case style, are pretty major idiosyncrasies).
On to your specific questions:
1.- What is the most widely used column width? (the eternal question)
I'm currently sticking to 80 columns
(and it's a pain!)
80 columns is most popular
2.- What quotes to use? (I've seen everything and PEP 8 does not mention
anything clear) I'm using single
quotes for everything but docstrings,
which use triple double quotes.
I prefer the style you're using, but even Google was not able to reach a consensus about this:-(
3.- Where do I put my imports? I'm putting them at file header in this
order.
import sys import -rest of python
modules needed-
import whatever import -rest of
application modules-
Yes, excellent choice, and popular too.
4.- Can I use "import whatever.function as blah"? I saw some
documents that disregard doing this.
I strongly recommend you always import modules -- not specific names from inside a module. This is not just style -- there are strong advantages e.g. in testability in doing that. The as clause is fine, to shorten a module's name or avoid clashes.
5.- Tabs or spaces for indenting? Currently using 4 spaces tabs.
Overwhelmingly most popular.
6.- Variable naming style? I'm using lowercase for everything but classes,
which I put in camelCase.
Almost everybody names classes with uppercase initial and constants with all-uppercase.

1.- Most everyone has a 16:9 or 16:10 monitor now days. Even if they don't have a wide-screen they have lots of pixels, 80 cols isn't a big practical deal breaker like it was when everyone was hacking at the command line in a remote terminal window on a 4:3 monitor at 320 X 240. I usually end the line when it gets too long, which is subjective. I am at 2048 X 1152 on a 23" Monitor X 2.
2.- Single quotes by default so you don't have to escape Double quotes, Double quotes when you need to embed single quotes, and Triple quotes for strings with embedded newlines.
3.- Put them at the top of the file, sometimes you put them in the main function if they aren't needed globally to the module.
4.- It is a common idiom to rename some modules. A good example is the following.
try:
# for Python 2.6.x
import json
except ImportError:
# for previous Pythons
try:
import simplejson as json
except ImportError:
sys.exit('easy_install simplejson')
but the preferred way to import just a class or function is from module import xxx with the optional as yyy if needed
5.- Always use SPACES! 2 or 4 as long as no TABS
6.- Classes should up UpperCaseCamelStyle, variables are lowercase sometimes lowerCamelCase or sometimes all_lowecase_separated_by_underscores, as are function names. "Constants" should be ALL_UPPER_CASE_SEPARATED_BY_UNDERSCORES
When in doubt refer to the PEP 8, the Python source, existing conventions in a code base. But the most import thing is to be internally consistent as possible. All Python code should look like it was written by the same person when ever possible.

Since I'm really crazy about "styling" I'll write down the guidelines that I currently use in a near 8k SLOC project with about 35 files, most of it matches PEP8.
PEP8 says 79(WTF?), I go with 80 and I'm used to it now. Less eye movement after all!
Docstrings and stuff that spans multiple lines in '''. Everything else in ''. Also I don't like double quotes, I only use single quotes all the time... guess that's because I came form the JavaScript corner, where it's just easier too use '', because that way you don't have to escape all the HTML stuff :O
At the head, built-in before custom application code. But I also go with a "fail early" approach, so if there's something that's version depended(GTK for example) I'd import that first.
Depends, most of the times I go with import foo and from foo import, but there a certain cases(e.G. the name is already defined by another import) were I use from foo import bar as bla too.
4 Spaces. Period. If you really want to use tabs, make sure to convert them to spaces before committing when working with SCM. BUT NEVER(!) MIX TABS AND SPACES!!! It can AND WILL introduce horrible bugs.
some_method or foo_function, a CONSTANT, MyClass.
Also you can argue about indentation in cases where a method call or something spans multiple lines, and you can argue about which line continuation style you will use. Either surround everything with () or do the \ at the end of the line thingy. I do the latter, and I also place operators and other stuff at the start of the next line.
# always insert a newline after a wrapped one
from bla import foo, test, goo, \
another_thing
def some_method_thats_too_long_for_80_columns(foo_argument, bar_argument, bla_argument,
baz_argument):
do_something(test, bla, baz)
value = 123 * foo + ten \
- bla
if test > 20 \
and x < 4:
test_something()
elif foo > 7 \
and bla == 2 \
or me == blaaaaaa:
test_the_megamoth()
Also I have some guidelines for comparison operations, I always use is(not) to check against None True False and I never do an implicit boolean comparison like if foo:, I always do if foo is True:, dynamic typing is nice but in some cases I just want to be sure that the thing does the right thing!
Another thing that I do is to never use empty strings! They are in a constants file, in the rest of the code I have stuff like username == UNSET_USERNAME or label = UNSET_LABEL it's just more descriptive that way!
I also have some strict whitespace guidelines and other crazy stuff, but I like it(because I'm crazy about it), I even wrote a script which checks my code:
http://github.com/BonsaiDen/Atarashii/blob/master/checkstyle
WARNING(!): It will hurt your feelings! Even more than JSLint does...
But that's just my 2 cents.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.