XML embedded in Python docstring, use syntax highlighting

XML embedded in Python docstring, use syntax highlighting - python

I'm working on a Python project where there are some blocks of XML defined inside docstrings.
The code contains strings like this
xml_str = """<a>
<b>text</b>
</a>"""
In reality, the blocks of embedded XML are much larger. The problem is this XML becomes difficult to read. Since an IDE renders the XML as a String in one color, the text cannot be visually parsed as easily as if it had normal XML syntax highlighting applied.
I'm looking for an editor which will either natively support syntax highlighting in Strings, or where such a feature can be hacked in easily. This feature would be really cool to have, so I'm prepared to invest some time to make it happen.
I realize there are some arguments for why embedding XML in such a manner would be bad practice. I would agree, except in this situation. I have found a way to solve a problem very effectively by placing XML in a Python file directly instead of into an external resource.
Edit
I normally use PyDev for Eclipse, so I would be biased to solutions using it. Although I am prepared to swtich IDEs for this project if necessary.

Use syntax highlighting plugin http://colorer.sf.net/eclipsecolorer/ for eclipse. It Automatic folding support for mostly all (200+) supported languages.

Related

Is there a way to add Julia, R and python to a single text file like R markdown or a notebook that could be manipulated as a text file?

Stated briefly: I would like to have a text file where I can smoothly switch among R, python and Julia. Of importance, I am looking for a way to run rather than just display code
I know it is possible to add python (and many other languages) to R markdown http://goo.gl/4w8XIb , but not sure I could add Julia. Also possible to use notebooks like Beaker http://beakernotebook.com/ with all three languages (and more) , but my issue with notebooks is that they are not nearly as fast to manipulate compared to what can be done with a text file in an editor environment (sublime, emacs, vim, atom ...). I know very little about notebooks, and the ones I know of are represented as json files, but manipulating a json file to write a report is all but user friendly.
I'm probably missing the obvious, but any other way to do this? thanks

I recently created an R package JuliaCall, and it can be used as julia engine in R Markdown document, see https://non-contradiction.github.io/JuliaCall/articles/JuliaCall_in_RMarkdown.html for an example.
Although JuliaCall is already on CRAN, this new feature is still in the development version on github. If you want to try it, use
devtools::install_github("Non-Contradiction/JuliaCall")
to install JuliaCall.
The feature includes
Multiple julia chunks running by same julia session.
Accessing R variables, functions inside julia code and vice versa.
The current limitation is that it only fully support html output.

With Restructured Text, there is good support for including code samples, where each code-block directive can include the relevant
language.
.. code-block:: ruby
Some Ruby code.
Markdown also supports mentioning the language with each code block, e.g.:
```javascript
var s = "JavaScript syntax highlighting";
alert(s);
```
```python
s = "Python syntax highlighting"
print s
```
```
No language indicated, so no syntax highlighting.
But let's throw in a <b>tag</b>.
```

I think Beaker Notebook is actually a very good solution for your needs. It is a polyglot tool which will let you combine R, Python and Julia very well. There is a Vim editing mode which is not perfect, but still quite fast. There are shortcut keys for executing cells quickly, executing only selected lines, as well as jumping between cells. Beaker is also a permissively licensed open source project on GitHub with a very responsive maintainer, so you could also contribute any missing features directly as PRs.

Python source analyzer that works with zope components

There's a code base (Py 2.6 stuff) that extensively uses zope components and there's a need for tool that can analyze the sources. The tool is supposed to analyze the source; like look for usage of some restricted classes/objects/interfaces, etc. Basically, scan each statement in the source while being aware of the statement's context (which class/method/function it is in, which module, etc) and analyze it for specific patterns.
The approach of reading the source into text buffers and matching patterns doesn't seem to be a reliable approach at all.
Another approach that came up was of using inspect, but inspect seems to be broken and doesn't seem to be able to handle our code base (multiple tries, all of them crashed inspect). I might have to give up on that as there seems to be now way out with inspect.
Now, other options that I could think of are using pylint or using AST (and doing a lot of postprocessing on it). Am not sure to what extent pylint is extensible; and when it comes to analyzing source statments if pylint could be aware of context (i.e, which class defn/function/method has this statement etc) Using AST seems too much of an overkill for such a trivial purpose.
What would you suggest as a suitable approach here? Anybody else had to deal with such an issue before?
Please suggest.

Generating parser in Python language from JavaCC source?

I do mean the ??? in the title because I'm not exactly sure. Let me explain the situation.
I'm not a computer science student & I never did any compilers course. Till now I used to think that compiler writers or students who did compilers course are outstanding because they had to write Parser component of the compiler in whatever language they are writing the compiler. It's not an easy job right?
I'm dealing with Information Retrieval problem. My desired programming language is Python.
Parser Nature:
http://ir.iit.edu/~dagr/frDocs/fr940104.0.txt is the sample corpus. This file contains around 50 documents with some XML style markup. (You can see it in above link). I need to note down other some other values like <DOCNO> FR940104-2-00001 </DOCNO> & <PARENT> FR940104-2-00001 </PARENT> and I only need to index the <TEXT> </TEXT> portion of document which contains some varying tags which I need to strip down and a lot of  comments that are to be neglected and some &hyph; &space; & character entities. I don't know why corpus has things like this when its know that it's neither meant to be rendered by browser nor a proper XML document.
I thought of using any Python XML parser and extract desired text. But after little searching I found JavaCC parser source code (Parser.jj) for the same corpus I'm using here. A quick look up on JavaCC followed by Compiler-compiler revealed that after all compiler writers aren't as great as I thought. They use Compiler-compiler to generate parser code in desired language. Wiki says input to compiler-compiler is input is a grammar (usually in BNF). This is where I'm lost.
Is Parser.jj the grammar (Input to compiler-compiler called JavaCC)? It's definitely not BNF. What is this grammar called? Why is this grammar has Java language? Isn't there any universal grammar language?
I want python parser for parsing the corpus. Is there any way I can translate Parser.jj to get python equivalent? If yes, what is it? If no, what are my other options?
By any chance does any one know what is this corpus? Where is its original source? I would like to see some description for it. It is distributed on internet with name frDocs.tar.gz

Why do you call this "XML-style" markup? - this looks like pretty standard/basic XML to me.
Try elementTree or lxml. Instead of writing a parser, use one of the stable, well-hardened libraries that are already out there.

You can't build a parser - let alone a whole compiler - from a(n E)BNF grammar - it's just the grammar, i.e. syntax (and some syntax, like Python's indentation-based block rules, can't be modeled in it at all), not the semantics. Either you use seperate tools for these aspects, or use a more advances framework (like Boost::Spirit in C++ or Parsec in Haskell) that unifies both.
JavaCC (like yacc) is responsible for generating a parser, i.e. the subprogram that makes sense of the tokens read from the source code. For this, they mix a (E)BNF-like notation with code written in the language the resulting parser will be in (for e.g. building a parse tree) - in this case, Java. Of course it would be possible to make up another language - but since the existing languages can handle those tasks relatively well, it would be rather pointless. And since other parts of the compiler might be written by hand in the same language, it makes sense to leave the "I got ze tokens, what do I do wit them?" part to the person who will write these other parts ;)
I never heard of "PythonCC", and google didn't either (well, theres a "pythoncc" project on google code, but it's describtion just says "pythoncc is a program that tries to generate optimized machine Code for Python scripts." and there was no commit since march). Do you mean any of these python parsing libraries/tools? But I don't think there's a way to automatically convert the javaCC code to a Python equivalent - but the whole thing looks rather simple, so if you dive in and learn a bit about parsing via javaCC and [python library/tool of your choice], you might be able to translate it...

wiki/docbook/latex documentation template system

I'm searching for a documentation template system or rather will be creating one.
It should support the following features:
Create output in PDF and HTML
Support for large & complicated (LaTeX) formulas
References between documents
Bibliographies
Templates will be filled by a Python script
I've tried LaTeX with various TeX-to-HTML converters but I'm not satisfied with the results.
I've been using DocBook for a while, but I think that editing DocBook is not easy to write and the support for formulas is not yet sufficient.
The main problem is, that there will be users of this system that do not know LaTeX syntax or DocBook. I've thought about an alternative for these users providing an editing possibility with Wiki syntax (converted by Python to LaTeX).
Let's sum up: I want HTML and PDF output from at least LaTeX and Wiki input. DocBook could be used as intermediate format.
Has anybody had a similar problem or can give me an advice on which tools and which file formats I should use ?

We use sphinx: https://www.sphinx-doc.org
It does almost all of that.
Your python script or your users or whomever (I can't follow the question) can create content using RST markup (which is perhaps the easiest of markup languages). You run it through Sphinx and you get HTML and Latex.

I created a LaTeX pre-processor and python module that allows you to embed python or SQL inside a LaTeX file. The python and/or SQL is executed and the output is folded in.
With latex2html or latex2rtf you can then use the LaTeX code to produce HTM and RTF.
I've posted it for you at http://simson.net/pylatex/

Arbortext supports LaTeX natively. You can send the publishing engine or print composer LaTeX and it'll pass it through directly.
It also supports a lot of other composition languages as well and even gives the opportunity to do page-layout manipulation like you'd see in InDesign (without the headache and overhead of ID).

I think that Asciidoc is better targeted at what you are trying to get. It is a simple markup language, it allows latex formulas in it and it generates Docbook documents from which you can further generate the readable HTML or Latex representation

Sweave for python

I've recently started using Sweave* for creating reports of analyses run with R, and am now looking to do the same with my python scripts.
I've found references to embedding python in Sweave docs, but that seems like a bit of a hack. Has anyone worked out a better solution, or is there an equivalent for python I'm not aware of?
* Sweave is a tool that allows to embed the R code for complete data analyses in latex documents

I have written a Python implementation of Sweave called Pweave that implements basic functionality and some options of Sweave for Python code embedded in reST or Latex document. You can get it here: http://mpastell.com/pweave and see the original blog post here: http://mpastell.com/2010/03/03/pweave-sweave-for-python/

Some suggestions:
I have been using Pweave for several years now, and it is very similar to Sweave. Highly recommended.
The most popular tool for embedded reports in python at this stage is Jupyter notebooks, which allow you to embed markdown, and they are quite useful although I personally still like writing things in LaTeX...
You can also have a look at PyLit, which is intended for literate programming with Python, but not as well maintained as some of the alternatives.
Sphinx is great for documenting with python, and can output LaTex.
Here's a list of tools for literate programming. Some of these work with any programming language.

Dexy is a very similar product to Sweave. One advantage of Dexy is that it is not exclusive to one single language. You could create a Dexy document that included R code, Python code, or about anything else.

This is a bit late, but for future reference you might consider my PythonTeX package for LaTeX. PythonTeX allows you enter Python code in a LaTeX document, run it, and bring back the output. But unlike Sweave, the document you actually edit is a valid .tex document (not .Snw or .Rnw), so editing the non-code part of the document is fast and convenient.
PythonTeX provides many features, including the following:
The document can be compiled without running any Python code; code only needs to be executed when it is modified.
All Python output is saved or cached.
Code runs in user-defined sessions. If there are multiple sessions, sessions automatically run in parallel using all available cores.
Errors and warnings are synchronized with the line numbers of the .tex document, so you know exactly where they came from.
Code can be executed, typeset, or typeset and executed. Syntax highlighting is provided by Pygments.
Anything printed by Python is automatically brought into the .tex document.
You can customize when code is re-executed (modified, errors, warnings, etc.).
The PythonTeX utilities class is available in all code that is executed. It allows you to automatically track dependencies and specify created files that should be cleaned up. For example, you can set the document to detect when the data it depends on is modified, so that code will be re-executed.
A basic PythonTeX file looks like this:
\documentclass{article}
\usepackage{pythontex}
\begin{document}
\begin{pycode}
#Whatever you want here!
\end{pycode}
\end{document}

You might consider noweb, which is language independent and is the basis for Sweave. I've used it for Python and it works well.
http://www.cs.tufts.edu/~nr/noweb/

I've restructured Matti's Pweave a bit, so that it is possible to define arbitrary "chunk-processors" as plugin-modules. This makes it easy to extend for several chunk-based text-preprocessing applications. The restructured version is available at https://bitbucket.org/edgimar/pweave/src. As an example, you could write the following LaTeX-Pweave document (notice the "processor name" in this example is specified with the name 'mplfig'):
\documentclass[a4paper]{article}
\usepackage{graphicx}
\begin{document}
\title{Test document}
\maketitle
Don't miss the great information in Figure \ref{myfig}!
<<p=mplfig, label=myfig, caption = "Figure caption...">>=
import sys
import pylab as pl
pl.plot([1,2,3,4,5],['2,4,6,8,10'], 'b.', markersize=15)
pl.axis('scaled')
pl.axis([-3,3, -3,3]) # [xmin,xmax, ymin,ymax]
#
\end{document}

You could try SageTeX which implements Sweave-Like functionality for the SAGE mathematics platform. I haven't played around with it as much as I would like to, but SAGE is basically a python shell and evaluates python as it's native language.

I have also thought about the same thing many times. After reading your questions and looking into your link I made small modifications to the custom python Sweave driver, that you link to. I modified it to also keep the source code and produce the output as well the same way that Sweave does for R.
I posted the modified version and an example here: http://mpastell.com/2010/02/09/python-in-sweave-document/
Granted, it is not optimal but I'm quite happy with the output and I like the ability to include both R and Python in the same document.
Edit about PyLit:
I also like PyLit and contrary to my original answer you can catch ouput with it as well, although it not as elegant as Sweave! Here is a small example how to do it:
import sys
# Catch PyLit output
a = range(3)
sys.stdout = open('output.txt', 'w')
print a
sys.stdout = sys.__stdout__
# .. include:: output.txt

What you're looking for is achieved with GNU Emacs and org-mode*. org-mode does far more than can be detailed in a single response, but the relevant points are:
Support for literate programming with the ability to integrate multiple languages within the same document (including using one language's results as the input for another language).
Graphics integration.
Export to LaTeX, HTML, PDF, and a variety of other formats natively, automatically generating the markup (but you can do it manually, too).
Everything is 100% customizable, allowing you to adapt the editor to your needs.
I don't have Python installed on my system, but below is an example of two different languages being run within the same session. The excerpt is modified from the wonderful org-mode R tutorial by Erik Iverson which explains the set up and effective use of org-mode for literate programming tasks. This SciPy 2013 presentation demonstrates how org-mode can be integrated into a workflow (and happens to use Python).
Emacs may seem intimidating. But for statistics/data science, it offers tremendous capabilities that either aren't offered anywhere else or are spread across various systems. Emacs allows you to integrate them all into a single interface. I think Daniel Gopar says it best in his Emacs tutorial,
Are you guys that lazy? I mean, c'mon, just read the tutorial, man.
An hour or so with the Emacs tutorial opens the door to some extremely powerful tools.
* Emacs comes with org-mode. No separate install is required.

Well, with reticulate which is a recent best implementation of a Python interface in R you could continue using Sweave and call Python inline using the R interpreter. For example this now works in a .Rnw or .Rmd markdown file.
```{r example, include=FALSE}
library(reticulate)
use_python("./dir/python")
```
```{python}
import pandas
data = pandas.read_csv("./data.csv")
print(data.head())
```

I think that Jupyter-book may do what you want.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.