Sweave for python

Sweave for python - python

I've recently started using Sweave* for creating reports of analyses run with R, and am now looking to do the same with my python scripts.
I've found references to embedding python in Sweave docs, but that seems like a bit of a hack. Has anyone worked out a better solution, or is there an equivalent for python I'm not aware of?
* Sweave is a tool that allows to embed the R code for complete data analyses in latex documents

I have written a Python implementation of Sweave called Pweave that implements basic functionality and some options of Sweave for Python code embedded in reST or Latex document. You can get it here: http://mpastell.com/pweave and see the original blog post here: http://mpastell.com/2010/03/03/pweave-sweave-for-python/

Some suggestions:
I have been using Pweave for several years now, and it is very similar to Sweave. Highly recommended.
The most popular tool for embedded reports in python at this stage is Jupyter notebooks, which allow you to embed markdown, and they are quite useful although I personally still like writing things in LaTeX...
You can also have a look at PyLit, which is intended for literate programming with Python, but not as well maintained as some of the alternatives.
Sphinx is great for documenting with python, and can output LaTex.
Here's a list of tools for literate programming. Some of these work with any programming language.

Dexy is a very similar product to Sweave. One advantage of Dexy is that it is not exclusive to one single language. You could create a Dexy document that included R code, Python code, or about anything else.

This is a bit late, but for future reference you might consider my PythonTeX package for LaTeX. PythonTeX allows you enter Python code in a LaTeX document, run it, and bring back the output. But unlike Sweave, the document you actually edit is a valid .tex document (not .Snw or .Rnw), so editing the non-code part of the document is fast and convenient.
PythonTeX provides many features, including the following:
The document can be compiled without running any Python code; code only needs to be executed when it is modified.
All Python output is saved or cached.
Code runs in user-defined sessions. If there are multiple sessions, sessions automatically run in parallel using all available cores.
Errors and warnings are synchronized with the line numbers of the .tex document, so you know exactly where they came from.
Code can be executed, typeset, or typeset and executed. Syntax highlighting is provided by Pygments.
Anything printed by Python is automatically brought into the .tex document.
You can customize when code is re-executed (modified, errors, warnings, etc.).
The PythonTeX utilities class is available in all code that is executed. It allows you to automatically track dependencies and specify created files that should be cleaned up. For example, you can set the document to detect when the data it depends on is modified, so that code will be re-executed.
A basic PythonTeX file looks like this:
\documentclass{article}
\usepackage{pythontex}
\begin{document}
\begin{pycode}
#Whatever you want here!
\end{pycode}
\end{document}

You might consider noweb, which is language independent and is the basis for Sweave. I've used it for Python and it works well.
http://www.cs.tufts.edu/~nr/noweb/

I've restructured Matti's Pweave a bit, so that it is possible to define arbitrary "chunk-processors" as plugin-modules. This makes it easy to extend for several chunk-based text-preprocessing applications. The restructured version is available at https://bitbucket.org/edgimar/pweave/src. As an example, you could write the following LaTeX-Pweave document (notice the "processor name" in this example is specified with the name 'mplfig'):
\documentclass[a4paper]{article}
\usepackage{graphicx}
\begin{document}
\title{Test document}
\maketitle
Don't miss the great information in Figure \ref{myfig}!
<<p=mplfig, label=myfig, caption = "Figure caption...">>=
import sys
import pylab as pl
pl.plot([1,2,3,4,5],['2,4,6,8,10'], 'b.', markersize=15)
pl.axis('scaled')
pl.axis([-3,3, -3,3]) # [xmin,xmax, ymin,ymax]
#
\end{document}

You could try SageTeX which implements Sweave-Like functionality for the SAGE mathematics platform. I haven't played around with it as much as I would like to, but SAGE is basically a python shell and evaluates python as it's native language.

I have also thought about the same thing many times. After reading your questions and looking into your link I made small modifications to the custom python Sweave driver, that you link to. I modified it to also keep the source code and produce the output as well the same way that Sweave does for R.
I posted the modified version and an example here: http://mpastell.com/2010/02/09/python-in-sweave-document/
Granted, it is not optimal but I'm quite happy with the output and I like the ability to include both R and Python in the same document.
Edit about PyLit:
I also like PyLit and contrary to my original answer you can catch ouput with it as well, although it not as elegant as Sweave! Here is a small example how to do it:
import sys
# Catch PyLit output
a = range(3)
sys.stdout = open('output.txt', 'w')
print a
sys.stdout = sys.__stdout__
# .. include:: output.txt

What you're looking for is achieved with GNU Emacs and org-mode*. org-mode does far more than can be detailed in a single response, but the relevant points are:
Support for literate programming with the ability to integrate multiple languages within the same document (including using one language's results as the input for another language).
Graphics integration.
Export to LaTeX, HTML, PDF, and a variety of other formats natively, automatically generating the markup (but you can do it manually, too).
Everything is 100% customizable, allowing you to adapt the editor to your needs.
I don't have Python installed on my system, but below is an example of two different languages being run within the same session. The excerpt is modified from the wonderful org-mode R tutorial by Erik Iverson which explains the set up and effective use of org-mode for literate programming tasks. This SciPy 2013 presentation demonstrates how org-mode can be integrated into a workflow (and happens to use Python).
Emacs may seem intimidating. But for statistics/data science, it offers tremendous capabilities that either aren't offered anywhere else or are spread across various systems. Emacs allows you to integrate them all into a single interface. I think Daniel Gopar says it best in his Emacs tutorial,
Are you guys that lazy? I mean, c'mon, just read the tutorial, man.
An hour or so with the Emacs tutorial opens the door to some extremely powerful tools.
* Emacs comes with org-mode. No separate install is required.

Well, with reticulate which is a recent best implementation of a Python interface in R you could continue using Sweave and call Python inline using the R interpreter. For example this now works in a .Rnw or .Rmd markdown file.
```{r example, include=FALSE}
library(reticulate)
use_python("./dir/python")
```
```{python}
import pandas
data = pandas.read_csv("./data.csv")
print(data.head())
```

I think that Jupyter-book may do what you want.

Related

How can I get the source code for Python functions?

I am learning Python and how to make classes. I was curious how the classes are made inside Python itself! For example, in datetime.py (I find it by googling) I was checking how they used __add__ or __sub__ which is using "if isinstance(other, timedelta):" that was an interesting learning. Also, I am learning what professionally written programs look like.
My question is how can I find the source codes of internal classes and functions inside Python, for example, I am interested to see how they implement add in print(), that can print(1+2) -> 3 and
print('a'+'b') -> ab

The source code for the reference implementation of Python is available here at their GitHub mirror. However, it's worth noting that large parts of the Python language are implemented in C, including the core engine and many of the standard library libraries. Really understanding how everything is implemented under the hood requires a fair amount of C fluency.

ipython is a great tool for exploring how things work. Just add "??" after a function or other callable, and it show the code when possible, ie. when it's "pure python".
Eg:
import this
this??

Python is an open source language which means the source code is available to any interested party. I would suggest looking at the source files on the machine you are using or looking at the CPython Github repo.
print() is a built in module. It is written in C and the source can be viewed in the file bltinmodule.c.
You may also find it useful to learn about the functions available in Python for getting help, like help() (documentation available here). To learn about the print() function you can call:
help(print)
I recommend reading the Beginner's Guide as a starting point for more resources.

How to get code-completion for COM programming in PyCharm?

When using app = win32com.client.Dispatch('Some.Application'), is there any feasible way get code-completion in PyCharm? It is rather tedious having to retype (or copy-paste) everything from an API documentation, so would creating skeletons be. Is there no other way to let PyCharm know about the Interface provided via COM, especially if I can provide a .tlb file? Or is there at least some way automatically generate such a skeleton (or a wrapping module?) from the TypeLib?

Since there is no way for PyCharm to know the runtime type of app, you shouldn't expect to get code completion on app directly; at least not until they decide to add built-in support for generating code from type libraries.
However, you can exploit the fact that win32com implicitly generates code based on the type library as described in the first part of this answer, together with PyCharm's support for type hinting, to get code completion on COM methods.
Make sure that the Python types have been generated; their location is determined by the GUID of the COM object. For example, the types for Microsoft Word 2016 on my machine are available in
C:\Users\username\appdata\local\temp\gen_py\3.6\00020905-0000-0000-c000-000000000046x0x8x7\.
Add this folder to the path of your PyCharm Python interpreter; see e.g. this answer.
Import the modules for which you want code completion.
In the screenshots below, we use this approach with Word's Find:
Now, besides feeling dirty, this approach relies on the relevant types having been generated and the code completion is limited to the methods published by the object, so I imagine its usefulness in practice might be somewhat limited; in particular, anybody working on the code will have to generate the code, or the annotations will cause NameErrors. Personally, I would probably prefer using Jupyter for the exploratory part of the implementation process, and with minimal tweaks outlined in the answer mentioned above, Jupyter can be extended to have full code completion with win32com.

Is there a way to add Julia, R and python to a single text file like R markdown or a notebook that could be manipulated as a text file?

Stated briefly: I would like to have a text file where I can smoothly switch among R, python and Julia. Of importance, I am looking for a way to run rather than just display code
I know it is possible to add python (and many other languages) to R markdown http://goo.gl/4w8XIb , but not sure I could add Julia. Also possible to use notebooks like Beaker http://beakernotebook.com/ with all three languages (and more) , but my issue with notebooks is that they are not nearly as fast to manipulate compared to what can be done with a text file in an editor environment (sublime, emacs, vim, atom ...). I know very little about notebooks, and the ones I know of are represented as json files, but manipulating a json file to write a report is all but user friendly.
I'm probably missing the obvious, but any other way to do this? thanks

I recently created an R package JuliaCall, and it can be used as julia engine in R Markdown document, see https://non-contradiction.github.io/JuliaCall/articles/JuliaCall_in_RMarkdown.html for an example.
Although JuliaCall is already on CRAN, this new feature is still in the development version on github. If you want to try it, use
devtools::install_github("Non-Contradiction/JuliaCall")
to install JuliaCall.
The feature includes
Multiple julia chunks running by same julia session.
Accessing R variables, functions inside julia code and vice versa.
The current limitation is that it only fully support html output.

With Restructured Text, there is good support for including code samples, where each code-block directive can include the relevant
language.
.. code-block:: ruby
Some Ruby code.
Markdown also supports mentioning the language with each code block, e.g.:
```javascript
var s = "JavaScript syntax highlighting";
alert(s);
```
```python
s = "Python syntax highlighting"
print s
```
```
No language indicated, so no syntax highlighting.
But let's throw in a <b>tag</b>.
```

I think Beaker Notebook is actually a very good solution for your needs. It is a polyglot tool which will let you combine R, Python and Julia very well. There is a Vim editing mode which is not perfect, but still quite fast. There are shortcut keys for executing cells quickly, executing only selected lines, as well as jumping between cells. Beaker is also a permissively licensed open source project on GitHub with a very responsive maintainer, so you could also contribute any missing features directly as PRs.

enable markdown syntax highlighting in code comments/docs

There are great in-code documentation standards for python, for example:
Google
Numpy
Which use some nice, simple ReStructured / Markdown like syntax. Is there a way to have emacs render ReST / Md inside the comments of python code? I.e. the major-mode would still be python.el, and normal python syntax would work; but inside a comment block ('''...''' ) it would be rendered as some sort of markdown.

You should be able to achieve this by using the polymode package (https://github.com/vspinu/polymode). It allows you to have multiple major modes in one buffer. Have a look at the screenshot examples.
There are other packages too which can enable multiple major modes in one file: http://www.emacswiki.org/emacs/MultipleModes

wiki/docbook/latex documentation template system

I'm searching for a documentation template system or rather will be creating one.
It should support the following features:
Create output in PDF and HTML
Support for large & complicated (LaTeX) formulas
References between documents
Bibliographies
Templates will be filled by a Python script
I've tried LaTeX with various TeX-to-HTML converters but I'm not satisfied with the results.
I've been using DocBook for a while, but I think that editing DocBook is not easy to write and the support for formulas is not yet sufficient.
The main problem is, that there will be users of this system that do not know LaTeX syntax or DocBook. I've thought about an alternative for these users providing an editing possibility with Wiki syntax (converted by Python to LaTeX).
Let's sum up: I want HTML and PDF output from at least LaTeX and Wiki input. DocBook could be used as intermediate format.
Has anybody had a similar problem or can give me an advice on which tools and which file formats I should use ?

We use sphinx: https://www.sphinx-doc.org
It does almost all of that.
Your python script or your users or whomever (I can't follow the question) can create content using RST markup (which is perhaps the easiest of markup languages). You run it through Sphinx and you get HTML and Latex.

I created a LaTeX pre-processor and python module that allows you to embed python or SQL inside a LaTeX file. The python and/or SQL is executed and the output is folded in.
With latex2html or latex2rtf you can then use the LaTeX code to produce HTM and RTF.
I've posted it for you at http://simson.net/pylatex/

Arbortext supports LaTeX natively. You can send the publishing engine or print composer LaTeX and it'll pass it through directly.
It also supports a lot of other composition languages as well and even gives the opportunity to do page-layout manipulation like you'd see in InDesign (without the headache and overhead of ID).

I think that Asciidoc is better targeted at what you are trying to get. It is a simple markup language, it allows latex formulas in it and it generates Docbook documents from which you can further generate the readable HTML or Latex representation

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.