Difficult workflow writing Latex book full of Python code

Difficult workflow writing Latex book full of Python code - python

I'm writing a book on coding in python using Latex. I plan on having a lot of text with python code interspersed throughout, along with its output. What's really giving me trouble is when I need to go back and edit my python code, it's a huge pain to get it back nicely into my latest document.
I've done a whole lot of research and can't seem to find a good solution.
This one includes full files as one, doesn't solve my issues
https://tex.stackexchange.com/questions/289385/workflow-for-including-jupyter-aka-ipython-notebooks-as-pages-in-a-latex-docum
Same with this one.
http://blog.juliusschulz.de/blog/ultimate-ipython-notebook
Found Solution 1 (awful)
I can copy and paste python code into latex ok using the listings latex package.
Pros:
Easy to update only small section of code.
Cons:
For output need to run in python, copy, paste separately.
Initial writing SLOW, need to do this process hundreds of times per chapter.
Found Solution 2 (bad)
Use jupyter notebook with markdown, export to Latex, \include file into main Latex document.
Pros:
Streamlined
Has output contained within.
Cons:
To make small changes, need to reimport whole document, any changes made to markdown text within Latex editor are not saved
Renaming a single variable in python after jupyter notebook could take hours.
Editing seems like a giant chore.
Ideal solution
Write Text in Latex
Write python in jupyter notebook, export to latex.
Somehow include code snippets (small sections of the exported file) into different parts of the main latex book. This is the part I can't figure out
When python changes are needed, changes in jupyter, then re-export as latex file with same name
Latex book is automatically updated from includes.
The key here is that the exported python notebook is being split up and sent to different parts of the document. In order for that to work it needs to somehow be tagged or marked in the markdown or code of the notebook, so when I re-export it those same parts get sent to the same spots in the book.
Pros:
Python edits easy, easily propagated back to book.
Text written in latex, can use power of latex
Any help in coming up with a solution closer to my ideal solution would be much appreciated. It's killing me.
Probably doesn't matter, but I'm coding both latex and jupyter notebooks in VS Code. I'm open to changing tools if it means solving these problems.

Here's a small script I wrote. It splits single *.ipynb file and converts it to multiple *.tex file.
Usage is:
copy following script and save as something like main.py
execute python main.py init. it will create main.tex and style_ipython_custom.tplx
in your jupyther notebook, add extra line #latex:tag_a, #latex:tag_b, .. to each cell which you want to extract. same tag will be extracted to same *.tex file.
save it as *.ipynb file. fortunately, current VSCode python plugin supports exporting to *.ipynb, or use jupytext to convert from *.py to *.ipynb.
run python main.py path/to/your.ipynb and it will create tag_a.tex and tag_b.tex
edit main.tex and add \input{tag_a.tex} or \input{tag_b.tex} where ever you want.
run pdflatex main.tex and it will produce main.pdf
The idea behind this script:
Converting from jupyter notebook to LaTex using default nbconvert.LatexExporter produces complete LaTex file which includes macro definitions. Using it to convert each cell will may create large LaTex file. To avoid the problem, the script first creates main.tex which has only macro definitions, and then converts each cell to LaTex file which has no macro defnition. This can be done using custom template file which is slightly modified from style_ipython.tplx
Tagging or marking the cell might be done using cell metadata, but I could not find how to set it in VSCode python plugin (Issue), so instead it scans source of each cell with regex pattern ^#latex:(.*), and remove it before converting it to LaTex file.
Source:
import sys
import re
import os
from collections import defaultdict
import nbformat
from nbconvert import LatexExporter, exporters
OUTPUT_FILES_DIR = './images'
CUSTOM_TEMPLATE = 'style_ipython_custom.tplx'
MAIN_TEX = 'main.tex'
def create_main():
# creates `main.tex` which only has macro definition
latex_exporter = LatexExporter()
book = nbformat.v4.new_notebook()
book.cells.append(
nbformat.v4.new_raw_cell(r'\input{__your_input__here.tex}'))
(body, _) = latex_exporter.from_notebook_node(book)
with open(MAIN_TEX, 'x') as fout:
fout.write(body)
print("created:", MAIN_TEX)
def init():
create_main()
latex_exporter = LatexExporter()
# copy `style_ipython.tplx` in `nbconvert.exporters` module to current directory,
# and modify it so that it does not contain macro definition
tmpl_path = os.path.join(
os.path.dirname(exporters.__file__),
latex_exporter.default_template_path)
src = os.path.join(tmpl_path, 'style_ipython.tplx')
target = CUSTOM_TEMPLATE
with open(src) as fsrc:
with open(target, 'w') as ftarget:
for line in fsrc:
# replace the line so than it does not contain macro definition
if line == "((*- extends 'base.tplx' -*))\n":
line = "((*- extends 'document_contents.tplx' -*))\n"
ftarget.write(line)
print("created:", CUSTOM_TEMPLATE)
def group_cells(note):
# scan the cell source for tag with regexp `^#latex:(.*)`
# if sames tags are found group it to same list
pattern = re.compile(r'^#latex:(.*?)$(\n?)', re.M)
group = defaultdict(list)
for num, cell in enumerate(note.cells):
m = pattern.search(cell.source)
if m:
tag = m.group(1).strip()
# remove the line which contains tag
cell.source = cell.source[:m.start(0)] + cell.source[m.end(0):]
group[tag].append(cell)
else:
print("tag not found in cell number {}. ignore".format(num + 1))
return group
def doit():
with open(sys.argv[1]) as f:
note = nbformat.read(f, as_version=4)
group = group_cells(note)
latex_exporter = LatexExporter()
# use the template which does not contain LaTex macro definition
latex_exporter.template_file = CUSTOM_TEMPLATE
try:
os.mkdir(OUTPUT_FILES_DIR)
except FileExistsError:
pass
for (tag, g) in group.items():
book = nbformat.v4.new_notebook()
book.cells.extend(g)
# unique_key will be prefix of image
(body, resources) = latex_exporter.from_notebook_node(
book,
resources={
'output_files_dir': OUTPUT_FILES_DIR,
'unique_key': tag
})
ofile = tag + '.tex'
with open(ofile, 'w') as fout:
fout.write(body)
print("created:", ofile)
# the image data which is embedded as base64 in notebook
# will be decoded and returned in `resources`, so write it to file
for filename, data in resources.get('outputs', {}).items():
with open(filename, 'wb') as fres:
fres.write(data)
print("created:", filename)
if len(sys.argv) <= 1:
print("USAGE: this_script [init|yourfile.ipynb]")
elif sys.argv[1] == "init":
init()
else:
doit()

Jupyter does not allow exporting specific cells from a notebook -- it only allows you to export the entire notebook. To get as close to your ideal scenario as possible, you need a modular Jupyter set-up:
Split your single Jupyter notebook into smaller notebooks.
Each notebook can then be exported to LaTeX via File > Download as > LaTeX (.tex)
In LaTeX, you can then import the generated .tex file via
\input{filname.tex}
If you want to import the smaller notebooks into cells of your main notebook, you can do this via (see magic command run)
%run my_other_notebook.ipynb #or %run 'my notebook with spaces.ipynb'
You can also insert python files via (see magic command load)
%load python_file.py
which loads the Python file and allows you to execute it in your main notebook.
You can also have small .py snippets, load them in your small Jupyter notebook, and then run that small notebook in your larger one.
Your use of VS Code is fine, tho, Jupyter in the browser may be faster for you to edit.
(reference for all magic commands)

I would use bookdown to have both test and source code in the same document (split over several files for convenience). This package originates in the R world, but is also usable together with other languages. Here a very simple example:
---
output: bookdown::pdf_document2
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Setup data
First we define some varialbes with data.
```{python data}
bob = ['Bob Smith', 42, 30000, 'software']
sue = ['Sue Jones', 45, 40000, 'music']
```
# Output data
then we output some of the data.
```{python output}
bob[0], sue[2]
```
# Reference code block
Finally lets repeate the code block without evaluating it.
```{python, ref.label="output", eval = FALSE}
```
Output:

Related

Can a Python Executable create files?

I converted my Python code into an executable using pyinstaller but part of my project involves printing the result onto a text document, stored on the computer. However this same code doesn't work once in the executable. I separately also tried to get it to just create files using the
with open(f'{random.random()}.txt', 'x') as sys.stdout:
print(encmsg)
option and using a random num generator to create new file names so it doesn't overlap. Still didn't create anything.
In Summary,
What code, when running through an executable file, will download the string it produces onto a text file onto the computer SOMEWHERE.
Thank you.
The relevant code is:
string = '1219151'
with open(f'{random.random()}.txt', 'x') as sys.stdout:
print(string)
That is it. It's in an executable, but doesn't produce the txt file anywhere.
I am on Windows 10.

How to call R Markdown knit function from Python script

I have a Python script that creates multiple .Rmd files, and I wanted a way to automatically turn them into .html's without having to manually knit each within RStudio. I've probably spent around 4 hours researching and trying different options, and although I've managed to make it work by calling a .R script with
subprocess.Popen(['Rscript', '--vanilla', 'rmd2html.R'], shell=False)
that then does the knitting with
rmarkdown::render("dicionarioNew.Rmd", "html_document"),
this for some reason does not use UTF-8 (which I need) and doesn't easily allow me to store the number of times the program has been run (necessary for giving a different name to each html file).

Your title question is answered in your question's body but you have more specific needs not met by your current implmentation:
how to render .Rmd using UTF-8
how to render .Rmd to html with custom filename output
I suggest you try seeking out the answers to these individually.
As a general answer to this question however, I suggest you consider using the Rscript command to run a custom R script which does what you want based on the source of rmd2html.R. You might also use R -e to execute a line or few of R code hardcoded as a string in your python script.
If you want to break things out further in python, there are many options for rendering at the chunk or file level individually using the sweave, rmarkdown, stationary, and other R packages.
Given a more specific example of what you are trying to accomplish someone may be able to help point you towards which of these many options would be right for your use case.

An alternative is to add a chunk that will knit the .rmd file when it is executed. See my approach below - it also opens the .html file in the view panel of Rstudio after knitting. That way you can probably also build in the tracker of how many times the code is run and add that into the file-name.
For the "auto_knit_chunk" shown below to work you need to setup two variables, KNITreport (TRUE/FALSE) and the output directory (or remove their use if you don't need them) in a chuck above the "auto_knit_chunk".
#IN A CHUNK ABOVE THE auto_knit_chunk SET THE VARIABLES
#if this RMD file should be knitted, set KINITreport = TRUE
KNITreport <- TRUE
#set an output path
dir_output <- "./output/"
The "auto_knit_chunk" chunk should only be executed IF knitr is not currently executed to avoid an infinite loop of calling knitr from each document that is knitted. So in the header do: eval = !=isTRUE(getOption('knitr.in.progress'))
# THIS IS THE HEADER ```{r, auto_knit_chunk, echo = FALSE, eval = !isTRUE(getOption('knitr.in.progress'))}
if(KNITreport)
{
#saving the currently open Rstudio file - needs to be saved bevore knitting to knit the most up to date version
rstudioapi::documentSave(rstudioapi::getActiveDocumentContext()$id)
#obtaining the file name of the currently active document
RMD_fileName <- str_replace(rstudioapi::getActiveDocumentContext()$path, paste0(sub("\\/[^\\/]*$", "",rstudioapi::getActiveDocumentContext()$path),"/"),"")
#setting up the file name and output directory to save the knitted file
outputFileName <- paste0(str_replace(RMD_fileName,".Rmd",""), "_KNITTED")
outputPath <- paste0(getwd(),str_replace(dir_output,".",""))
#if there is an RMD_fileName knit it
if(RMD_fileName!=""){
rmarkdown::render(input = RMD_fileName, output_file = outputFileName, output_dir = outputPath)
print(paste("Summary file:",outputFileName," is at",outputPath))
}else print("No RMD file retrieved")
#a function to display HTML content in the Rstudio viewer
viewerpane.html <- function(xfile, vsize=NULL){
# viewerpane.html was written by anwhite03 and published here:
# https://community.rstudio.com/t/rstudio-knit-explicit-r-command-for-preview-option-after-rmd-is-knit/11368
# Function: viewerpane.html version 1.00 23July2018
# Purpose: view RMarkdown Knit-generated html file in RStudio Viewer pane
# Status: Dev/Test
# Args:
# xfile = quoted name of html file (and path if not located in current directory)
# vsize = viewer arg height, default=NULL; alt values: "maximize", or numeric {3 to 8}
# Example: x <- "RMD-Demo-Viridis-002x.html"
# References:
# 1. https://rstudio.github.io/rstudio-extensions/rstudio_viewer.html
# 2. https://rstudio.github.io/rstudio-extensions/pkgdown/rstudioapi/reference/viewer.html
# 3. https://rstudio.github.io/rstudio-extensions/rstudioapi.html
#
# library(rstudioapi)
xfile.b <- basename(xfile)
tempDir <- tempfile()
dir.create(tempDir)
htmlFile <- file.path(tempDir, xfile.b)
# (code to write some content to the file) -- see next line
file.copy(xfile, htmlFile)
viewer <- getOption("viewer")
viewer(htmlFile, height = vsize)
}
#calling the function to display the HTML document
viewerpane.html(xfile = paste0(outputPath,outputFileName,".html"))
}

Generate TeX/LaTeX file and compile both in Python

I am preparing a data processing code and the results from these are to be fed to a TeX/LaTeX file and the file is to be compiled by Python (i.e. Python should send the command to compile the TeX/LaTeX file)
At present I plan to generate a text file (with .tex extension) with the necessary syntax for TeX/LaTeX and then call external system command using os.system. Is there any easier way or modules that can do this?

No such Python modules exist but you could try generating text files with the required syntax of the text file with .tex extension and compile it by running system commands with Python:
import os
os.system("pdflatex filename")

You could use PyLaTeX for this. This is exactly one of the things that it is meant to be used for. https://github.com/JelteF/PyLaTeX

Make sure, you have installed Perl and MikTeX (for latexmk and pdflatex compilers support) in your system.
If not, you can download Perl from https://www.perl.org/get.html
and MikTeX from https://miktex.org/download.
Also do not forget to check http://mg.readthedocs.io/latexmk.html#installation as it guides nicely about Latex compilers.
I have document.tex with following content.
\documentclass{article}%
\usepackage[T1]{fontenc}%
\usepackage[utf8]{inputenc}%
\usepackage{lmodern}%
\usepackage{textcomp}%
\usepackage{lastpage}%
%
%
%
\begin{document}%
\normalsize%
\section{A section}%
\label{sec:A section}%
Some regular text and some %
\textit{italic text. }%
\subsection{A subsection}%
\label{subsec:A subsection}%
Also some crazy characters: \$\&\#\{\}
%
\end{document}
Finally create any python file and paste any one of the below code and run.
1st way
# 1st way
import os
tex_file_name = "document.tex";
os.system("latexmk " + tex_file_name + " -pdf");
2nd way
# 2nd way
import os
tex_file_name = "document.tex";
os.system("pdflatex " + tex_file_name);
For compiling complex latex files, you need to look for extra options passed with latexmk or pdflatex commands.
Thanks.

How to write OS X Finder Comments from python?

I'm working on a python script that creates numerous images files based on a variety of inputs in OS X Yosemite. I am trying to write the inputs used to create each file as 'Finder comments' as each file is created so that IF the the output is visually interesting I can look at the specific input values that generated the file. I've verified that this can be done easily with apple script.
tell application "Finder" to set comment of (POSIX file "/Users/mgarito/Desktop/Random_Pixel_Color/2015-01-03_14.04.21.png" as alias) to {Val1, Val2, Val3} as Unicode text
Afterward, upon selecting the file and showing its info (cmd+i) the Finder comments clearly display the expected text 'Val1, Val2, Val2'.
This is further confirmed by running mdls [File/Path/Name] before and after the applescript is used which clearly shows the expected text has been properly added.
The problem is I can't figure out how to incorporate this into my python script to save myself.
Im under the impression the solution should* be something to the effect of:
VarList = [Var1, Var2, Var3]
Fiele = [File/Path/Name]
file.os.system.add(kMDItemFinderComment, VarList)
As a side note I've also look at xattr -w [Attribute_Name] [Attribute_Value] [File/Path/Name] but found that though this will store the attribute, it is not stored in the desired location. Instead it ends up in an affiliated pList which is not what I'm after.

Here is my way to do that.
First you need to install applescript package using pip install applescript command.
Here is a function to add comments to a file:
def set_comment(file_path, comment_text):
import applescript
applescript.tell.app("Finder", f'set comment of (POSIX file "{file_path}" as alias) to "{comment_text}" as Unicode text')
and then I'm just using it like this:
set_comment('/Users/UserAccountName/Pictures/IMG_6860.MOV', 'my comment')

After more digging, I was able to locate a python applescript bundle: https://pypi.python.org/pypi/py-applescript
This got me to a workable answer, though I'd still prefer to do this natively in python if anyone has a better option?
import applescript
NewFile = '[File/Path/Name]' <br>
Comment = "Almost there.."
AddComment = applescript.AppleScript('''
on run {arg1, arg2}
tell application "Finder" to set comment of (POSIX file arg1 as alias) to arg2 as Unicode text
return
end run
''')
print(AddComment.run(NewFile, Comment))
print("Done")

This is the function to get comment of a file.
def get_comment(file_path):
import applescript
return applescript.tell.app("Finder", f'get comment of (POSIX file "{file_path}" as alias)').out
print(get_comment('Your Path'))

Another approach is to use appscript, a high-level Apple event bridge that is sadly no longer officially supported but still works (and saw an updated release in Jan. 2021). Here is an example of reading and setting the comment on a file:
import appscript
import mactypes
# Get a handle on the Finder.
finder = appscript.app('Finder')
# Tell Finder to select the file.
file = finder.items[mactypes.Alias("/path/to/a/file")]
# Print the current comment
comment = file.comment()
print("Current comment: " + comment)
# Set a new comment.
file.comment.set("New comment")
# Print the current comment again to verify.
comment = file.comment()
print("Current comment: " + comment)
Despite that the author of appscript recommends against using it in new projects, I used it recently to create a command-line utility called Urial for the specialized purpose of writing and updating URIs in Finder comments. Perhaps its code can serve as an an additional example of using appscript to manipulate Finder comments.

running python code, to use sextractor with more than one image file

I have written piece of code which runs sextractor from python, however I only know how to do this for one file, and i need to loop it over 62 files. Im not sure how i would go about doing this. I have attached my code bellow:
#!/usr/bin/env python
# build a catalog using sextractor on des image here
sys.path.append('/home/fitsfiles') #not sure if this does anything/is correct
def sex(image, output, sexdir='/home/sextractor-2.5.0', check_img=None,config=None, l=None) :
'''Construct a sextractor command and run it.'''
#creates a sextractor line e.g sex img.fits -catalog_name -checkimage_name
q="/home/fitsfiles/"+ "01" +".fits"
com = [ "sex ", q, " -CATALOG_NAME " + output]
s0=''
com = s0.join(com)
res = os.system(com)
return res
img_name=sys.argv[0]
output=img_name[0:1]+'_star_catalog.fits'
t=sex(img_name,output)
print '----done !---'
so this code produces a command in my main terminal of, sex /home/fitsfiles/01.fits -CATALOG_NAME g_star_catalog.fits
which successfully produces a star catalogue as I want.
However I want my code to to this for 62 fits files and change the name of star_catalog.fits depending upon which fitsfile is being used. any help would be appreciated.

There are many ways you could approach this. Let's assume you want to run your script as something like
python extract_stars.py /home/fitsfiles/*.fits
Then, you could try something like this:
for arg in len(sys.argv):
filename = arg.split('/')[-1].strip('.fits')
t = sex(arg, filename +'_star_catalog.fits')
# Whatever else
This assumes that you remove the line in sex that reformats the input filename. Also, you do not need to append the fits directory to your path.
The alternative approach is, if you do not plan to do anything else in python, you could write a bash script which would really simplify the task.
And, as a side note, you if you had asked this question more generally (ie, I wish to apply a function I wrote to a number of input files) and without reference to a rather uncommonly used application, you would have likely received an answer much more quickly.

The community has now developed some python wrappers which allow you to run sextractor as if it was a python command. These are: pysex, sewpy and astromatic_wrapper.
The good thing about sextractor wrappers is that allow you to write much cleaner code without the need of defining extra functions, invoking os commands or having the configuration files and the outputfiles on your machine. Moreover, the output can be an astropy table, a pandas dataframe or a numpy array.
For your specific case, you could use pysex and do:
import pysex
import glob
filelist = glob.glob('/directory/*.fits')
for fitsfile in filelist:
cat = pysex.run(fitsfile, params=['X_IMAGE', 'Y_IMAGE', 'FLUX_APER'],
conf_args={'PHOT_APERTURES':5})
print cat['FLUX_APER']

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.