How to call R Markdown knit function from Python script

How to call R Markdown knit function from Python script - python

I have a Python script that creates multiple .Rmd files, and I wanted a way to automatically turn them into .html's without having to manually knit each within RStudio. I've probably spent around 4 hours researching and trying different options, and although I've managed to make it work by calling a .R script with
subprocess.Popen(['Rscript', '--vanilla', 'rmd2html.R'], shell=False)
that then does the knitting with
rmarkdown::render("dicionarioNew.Rmd", "html_document"),
this for some reason does not use UTF-8 (which I need) and doesn't easily allow me to store the number of times the program has been run (necessary for giving a different name to each html file).

Your title question is answered in your question's body but you have more specific needs not met by your current implmentation:
how to render .Rmd using UTF-8
how to render .Rmd to html with custom filename output
I suggest you try seeking out the answers to these individually.
As a general answer to this question however, I suggest you consider using the Rscript command to run a custom R script which does what you want based on the source of rmd2html.R. You might also use R -e to execute a line or few of R code hardcoded as a string in your python script.
If you want to break things out further in python, there are many options for rendering at the chunk or file level individually using the sweave, rmarkdown, stationary, and other R packages.
Given a more specific example of what you are trying to accomplish someone may be able to help point you towards which of these many options would be right for your use case.

An alternative is to add a chunk that will knit the .rmd file when it is executed. See my approach below - it also opens the .html file in the view panel of Rstudio after knitting. That way you can probably also build in the tracker of how many times the code is run and add that into the file-name.
For the "auto_knit_chunk" shown below to work you need to setup two variables, KNITreport (TRUE/FALSE) and the output directory (or remove their use if you don't need them) in a chuck above the "auto_knit_chunk".
#IN A CHUNK ABOVE THE auto_knit_chunk SET THE VARIABLES
#if this RMD file should be knitted, set KINITreport = TRUE
KNITreport <- TRUE
#set an output path
dir_output <- "./output/"
The "auto_knit_chunk" chunk should only be executed IF knitr is not currently executed to avoid an infinite loop of calling knitr from each document that is knitted. So in the header do: eval = !=isTRUE(getOption('knitr.in.progress'))
# THIS IS THE HEADER ```{r, auto_knit_chunk, echo = FALSE, eval = !isTRUE(getOption('knitr.in.progress'))}
if(KNITreport)
{
#saving the currently open Rstudio file - needs to be saved bevore knitting to knit the most up to date version
rstudioapi::documentSave(rstudioapi::getActiveDocumentContext()$id)
#obtaining the file name of the currently active document
RMD_fileName <- str_replace(rstudioapi::getActiveDocumentContext()$path, paste0(sub("\\/[^\\/]*$", "",rstudioapi::getActiveDocumentContext()$path),"/"),"")
#setting up the file name and output directory to save the knitted file
outputFileName <- paste0(str_replace(RMD_fileName,".Rmd",""), "_KNITTED")
outputPath <- paste0(getwd(),str_replace(dir_output,".",""))
#if there is an RMD_fileName knit it
if(RMD_fileName!=""){
rmarkdown::render(input = RMD_fileName, output_file = outputFileName, output_dir = outputPath)
print(paste("Summary file:",outputFileName," is at",outputPath))
}else print("No RMD file retrieved")
#a function to display HTML content in the Rstudio viewer
viewerpane.html <- function(xfile, vsize=NULL){
# viewerpane.html was written by anwhite03 and published here:
# https://community.rstudio.com/t/rstudio-knit-explicit-r-command-for-preview-option-after-rmd-is-knit/11368
# Function: viewerpane.html version 1.00 23July2018
# Purpose: view RMarkdown Knit-generated html file in RStudio Viewer pane
# Status: Dev/Test
# Args:
# xfile = quoted name of html file (and path if not located in current directory)
# vsize = viewer arg height, default=NULL; alt values: "maximize", or numeric {3 to 8}
# Example: x <- "RMD-Demo-Viridis-002x.html"
# References:
# 1. https://rstudio.github.io/rstudio-extensions/rstudio_viewer.html
# 2. https://rstudio.github.io/rstudio-extensions/pkgdown/rstudioapi/reference/viewer.html
# 3. https://rstudio.github.io/rstudio-extensions/rstudioapi.html
#
# library(rstudioapi)
xfile.b <- basename(xfile)
tempDir <- tempfile()
dir.create(tempDir)
htmlFile <- file.path(tempDir, xfile.b)
# (code to write some content to the file) -- see next line
file.copy(xfile, htmlFile)
viewer <- getOption("viewer")
viewer(htmlFile, height = vsize)
}
#calling the function to display the HTML document
viewerpane.html(xfile = paste0(outputPath,outputFileName,".html"))
}

Related

Refactor only selection black [duplicate]

We are not ready to automatically format the whole source code with black.
But from time to time I would like to execute black -S on a region via PyCharm.
There is a hint in the docs how to run black (or black -S (what I like)) on the whole file. But ...
How to run black only on a selected region?

Using Python Black on a code region in the PyCharm IDE can be done by implementing it as an external tool. Currently Black has two main options to choose the code to format
Run Black on the whole module specifying it on the CLI as the [SRC]...
Passing the code region as a string on the CLI using the -c, --code TEXT option.
The following implementation shows how to do this using the 2nd option. The reason is that applying Black to the whole module is likely to change the number of lines thus making the job of selecting the code region by choosing start and end line numbers more complicated.
Implementing the 1st option can be done, but it would require mapping the initial code region to the final code region after Black formats the entire module.
Lets take as example the following code that has a number of obvious PEP-8 violations (missing white-spaces and empty lines):
"""
long multi-line
comment
"""
def foo(token:int=None)->None:
a=token+1
class bar:
foo:int=None
def the_simple_test():
"""the_simple_test"""
pass
Step 1.
Using Black as an external tool in the IDE can be configured by going to File > Tools > External Tools and clicking the Add or Edit icons.
What is of interesst is passing the right Macros - (see point 3 "Parameter with macros") from the PyCharm IDE to the custom script that calls Black and does the necessary processing. Namely you'll need the Macros
FilePath - File Path
SelectionStartLine - Selected text start line number
SelectionEndLine - Select text end line number
PyInterpreterDirectory - The directory containing the Python interpreter selected for the project
But from time to time I would like to execute black -S on a region via PyCharm.
Any additional Black CLI options you want to pass as arguments are best placed at the end of the parameter list.
Since you may have Black installed on a specific venv, the example also uses the PyInterpreterDirectory macro.
The screenshot illustrates the above:
Step 2.
You'll need to implement a script to call Black and interface with the IDE. The following is a working example. It should be noted:
Four lines are OS/shell specific as commented (it should be trivial to adapt them to your environment).
Some details could be further tweaked, for purpose of example the implementation makes simplistic choices.
import os
import pathlib
import tempfile
import subprocess
import sys
def region_to_str(file_path: pathlib.Path, start_line: int, end_line: int) -> str:
file = open(file_path)
str_build = list()
for line_number, line in enumerate(file, start=1):
if line_number > end_line:
break
elif line_number < start_line:
continue
else:
str_build.append(line)
return "".join(str_build)
def black_to_clipboard(py_interpeter, black_cli_options, code_region_str):
py_interpreter_path = pathlib.Path(py_interpeter) / "python.exe" # OS specific, .exe for Windows.
proc = subprocess.Popen([py_interpreter_path, "-m", "black", *black_cli_options,
"-c", code_region_str], stdout=subprocess.PIPE)
try:
outs, errs = proc.communicate(timeout=15)
except TimeoutExpired:
proc.kill()
outs, errs = proc.communicate()
# By default Black outputs binary, decodes to default Python module utf-8 encoding.
result = outs.decode('utf-8').replace('\r','') # OS specific, remove \r from \n\r Windows new-line.
tmp_dir_name = tempfile.gettempdir()
tmp_file = tempfile.gettempdir() + "\\__run_black_tmp.txt" # OS specific, escaped path separator.
with open(tmp_file, mode='w+', encoding='utf-8', errors='strict') as out_file:
out_file.write(result + '\n')
command = 'clip < ' + str(tmp_file) # OS specific, send result to clipboard for copy-paste.
os.system(command)
def main(argv: list[str] = sys.argv[1:]) -> int:
"""External tool script to run black on a code region.
Args:
argv[0] (str): Path to module containing code region.
argv[1] (str): Code region start line.
argv[2] (str): Code region end line.
argv[3] (str): Path to venv /Scripts directory.
argv[4:] (str): Black CLI options.
"""
# print(argv)
lines_as_str = region_to_str(argv[0], int(argv[1]), int(argv[2]))
black_to_clipboard(argv[3], argv[4:], lines_as_str)
if __name__ == "__main__":
main(sys.argv[1:])
Step 3.
The hard part is done. Lets use the new functionality.
Normally select the lines you want as your code region in the editor. This has to be emphasized because the previous SelectionStartLine and SelectionEndLine macros need the selection to work. (See the next screenshot).
Step 4.
Run the external tool previously implemented. This can be done by right clicking in the editor and choosing External Tools > the_name_of_your_external_tool.
Step 5.
Simply paste (the screenshot shows the result after running the external tool and pressing Ctrl + v). The implementation in Step 2 copies Black's output to your OS's clipboard, this seemed like the preferable solution since this way you change the file inside the editor thus Undo Ctrl + z will also work. Changing the file by overwrite it programmatically outside the editor would be less seamless and might require refreshing it inside the editor.
Step 6.
You can record a macro of the previous steps and associate it with a keyboard shortcut to have the above functionality in one keystroke (similar to copy-paste Ctrl + c + Ctrl + v).
End Notes.
If you need to debug the functionality in Step 2 a Run Configuration can also be configured using the same macros the external tool configuration did.
It's important to notice when using the clipboard that character encodings can change across the layers. I decided to use clip and read into it directly from a temporary file, this was to avoid passing the code string to Black on the command line because the CMD Windows encoding is not UTF-8 by default. (For Linux users this should be simpler but can depend on your system settings.)
One important note is that you can choose a code region without the broader context of its indentation level. Meaning, for example, if you only choose 2 methods inside a class they will be passed to Black and formatted with the indentation level of module level functions. This shouldn't be a problem if you are careful to select code regions with their proper scope. This could also easily be solved by passing the additional macro SelectionStartColumn - Select text start column number from Step 1 and prepending that number of whitespaces to each line in the Step 2 script. (Ideally such functionality would be implemented by Black as a CLI option.) In any case, if needed, using Tab to put the region in its proper indentation level is easy enough.
The main topic of the question is how to integrating Black with the PyCharm IDE for a code region, so demonstrating the 2nd option should be enough to address the problem because the 1st option would, for the most part, only add implementation specific complexity. (The answer is long enough as it is. The specifics of implementing the 1st option would make a good Feature/Pull Request for the Black project.)

I have researched about this because it actually looks interesting, and I've came to the conclusion that you can maybe use:
black -S and_your_file_path
or:
black -c and_a_string
to format the code passed in as a string.
I will also follow this thread because it looks interesting.
And I'm also going to do more research on this and if I find something I will let you know.

Python can't read a temporary file

I have a python program, which is supposed to calculate changes based on a value written in a temporary file (eg. "12345\n"). It is always an integer.
I have tried different methods to read the file, but python wasn't able to read it. So then I had the idea to execute a shell command ("cat") that will return content. When I execute this in the shell it works fine, but python the feedback I get is empty. Then I tried writing a bash and then a php skript, which would read the file and then return the value. In python I called them over the shell and the feedback I get is empty as well.
I was wondering if that was a general problem in python and made my scripts return the content of other temporary files, which worked fine.
Inside my scripts I was able to do calculations with the value and in the shell the output is exactly as expected, but not when called via python. I also noticed that I don't get the value with my extra scripts when they are called by phython (I tried to write it into another file; it was updated but empty).
The file I am trying to read is in the /tmp directory and is written into serveral time per second by another script.
I am looking for a solution (open for new ideas) in which I end up having the value of the file in a python variable.
Thanks for the help
Here are my programs:
python:
# python script
import subprocess
stdout = subprocess.Popen(["php /path/to/my/script.php"], shell = True, stdout = subprocess.PIPE).communicate()[0].decode("utf-8")
# other things I tried
#with open("/tmp/value.txt", "r") as file:
# stdout = file.readline() # output = "--"
#stdout = os.popen("cat /tmp/value.txt").read() # output = "--"
#stdout = subprocess.check_output(["php /path/to/my/script.php"], shell = True, stdout = subprocess.PIPE).decode("utf-8") # output = "--"
print(str("-" + stdout + "-")) # output = "--"
php:
# php script
valueFile = fopen("/tmp/value.txt", "r");
value = trim(fgets($valueFile), "\n");
fclose($valueFile);
echo $value; # output in the shell is the value of $value
Edit: context: my python script is started by another python script, which listens for commands from an apache server on the pi. The value I want to read comes from a "1wire" device that listens for S0-signals.

How to grab contents of a 'specific' output variable from python process, to a powershell script variable? if the python code has multiple outputs

I was trying to automate the launching of a python code for an application. The code has multiple output variables and I only want to grab a single variable's content of the code to a powershell variable. Can this be done? I am very new to powershell.
I was trying to use System.Diagnostics.Process, but seem to have hit a roadblock. Any clue as to how this can be done?
text=r'Text'
image=r'Path\to\image'
lang = r'lang'
read_Text = r'Text_from_image'
s= SequenceMatcher(None, read_Text,text)
r = round(s.ratio(),3)
print (r)
So, as we can see there are multiple variables storing values. Only the last line is outputting to the stream and is grabbed when I use
$stdout = $Process.StandardOutput.ReadToEnd().
PS $HOME> $HOME\Time.ps1
65.2252356

Difficult workflow writing Latex book full of Python code

I'm writing a book on coding in python using Latex. I plan on having a lot of text with python code interspersed throughout, along with its output. What's really giving me trouble is when I need to go back and edit my python code, it's a huge pain to get it back nicely into my latest document.
I've done a whole lot of research and can't seem to find a good solution.
This one includes full files as one, doesn't solve my issues
https://tex.stackexchange.com/questions/289385/workflow-for-including-jupyter-aka-ipython-notebooks-as-pages-in-a-latex-docum
Same with this one.
http://blog.juliusschulz.de/blog/ultimate-ipython-notebook
Found Solution 1 (awful)
I can copy and paste python code into latex ok using the listings latex package.
Pros:
Easy to update only small section of code.
Cons:
For output need to run in python, copy, paste separately.
Initial writing SLOW, need to do this process hundreds of times per chapter.
Found Solution 2 (bad)
Use jupyter notebook with markdown, export to Latex, \include file into main Latex document.
Pros:
Streamlined
Has output contained within.
Cons:
To make small changes, need to reimport whole document, any changes made to markdown text within Latex editor are not saved
Renaming a single variable in python after jupyter notebook could take hours.
Editing seems like a giant chore.
Ideal solution
Write Text in Latex
Write python in jupyter notebook, export to latex.
Somehow include code snippets (small sections of the exported file) into different parts of the main latex book. This is the part I can't figure out
When python changes are needed, changes in jupyter, then re-export as latex file with same name
Latex book is automatically updated from includes.
The key here is that the exported python notebook is being split up and sent to different parts of the document. In order for that to work it needs to somehow be tagged or marked in the markdown or code of the notebook, so when I re-export it those same parts get sent to the same spots in the book.
Pros:
Python edits easy, easily propagated back to book.
Text written in latex, can use power of latex
Any help in coming up with a solution closer to my ideal solution would be much appreciated. It's killing me.
Probably doesn't matter, but I'm coding both latex and jupyter notebooks in VS Code. I'm open to changing tools if it means solving these problems.

Here's a small script I wrote. It splits single *.ipynb file and converts it to multiple *.tex file.
Usage is:
copy following script and save as something like main.py
execute python main.py init. it will create main.tex and style_ipython_custom.tplx
in your jupyther notebook, add extra line #latex:tag_a, #latex:tag_b, .. to each cell which you want to extract. same tag will be extracted to same *.tex file.
save it as *.ipynb file. fortunately, current VSCode python plugin supports exporting to *.ipynb, or use jupytext to convert from *.py to *.ipynb.
run python main.py path/to/your.ipynb and it will create tag_a.tex and tag_b.tex
edit main.tex and add \input{tag_a.tex} or \input{tag_b.tex} where ever you want.
run pdflatex main.tex and it will produce main.pdf
The idea behind this script:
Converting from jupyter notebook to LaTex using default nbconvert.LatexExporter produces complete LaTex file which includes macro definitions. Using it to convert each cell will may create large LaTex file. To avoid the problem, the script first creates main.tex which has only macro definitions, and then converts each cell to LaTex file which has no macro defnition. This can be done using custom template file which is slightly modified from style_ipython.tplx
Tagging or marking the cell might be done using cell metadata, but I could not find how to set it in VSCode python plugin (Issue), so instead it scans source of each cell with regex pattern ^#latex:(.*), and remove it before converting it to LaTex file.
Source:
import sys
import re
import os
from collections import defaultdict
import nbformat
from nbconvert import LatexExporter, exporters
OUTPUT_FILES_DIR = './images'
CUSTOM_TEMPLATE = 'style_ipython_custom.tplx'
MAIN_TEX = 'main.tex'
def create_main():
# creates `main.tex` which only has macro definition
latex_exporter = LatexExporter()
book = nbformat.v4.new_notebook()
book.cells.append(
nbformat.v4.new_raw_cell(r'\input{__your_input__here.tex}'))
(body, _) = latex_exporter.from_notebook_node(book)
with open(MAIN_TEX, 'x') as fout:
fout.write(body)
print("created:", MAIN_TEX)
def init():
create_main()
latex_exporter = LatexExporter()
# copy `style_ipython.tplx` in `nbconvert.exporters` module to current directory,
# and modify it so that it does not contain macro definition
tmpl_path = os.path.join(
os.path.dirname(exporters.__file__),
latex_exporter.default_template_path)
src = os.path.join(tmpl_path, 'style_ipython.tplx')
target = CUSTOM_TEMPLATE
with open(src) as fsrc:
with open(target, 'w') as ftarget:
for line in fsrc:
# replace the line so than it does not contain macro definition
if line == "((*- extends 'base.tplx' -*))\n":
line = "((*- extends 'document_contents.tplx' -*))\n"
ftarget.write(line)
print("created:", CUSTOM_TEMPLATE)
def group_cells(note):
# scan the cell source for tag with regexp `^#latex:(.*)`
# if sames tags are found group it to same list
pattern = re.compile(r'^#latex:(.*?)$(\n?)', re.M)
group = defaultdict(list)
for num, cell in enumerate(note.cells):
m = pattern.search(cell.source)
if m:
tag = m.group(1).strip()
# remove the line which contains tag
cell.source = cell.source[:m.start(0)] + cell.source[m.end(0):]
group[tag].append(cell)
else:
print("tag not found in cell number {}. ignore".format(num + 1))
return group
def doit():
with open(sys.argv[1]) as f:
note = nbformat.read(f, as_version=4)
group = group_cells(note)
latex_exporter = LatexExporter()
# use the template which does not contain LaTex macro definition
latex_exporter.template_file = CUSTOM_TEMPLATE
try:
os.mkdir(OUTPUT_FILES_DIR)
except FileExistsError:
pass
for (tag, g) in group.items():
book = nbformat.v4.new_notebook()
book.cells.extend(g)
# unique_key will be prefix of image
(body, resources) = latex_exporter.from_notebook_node(
book,
resources={
'output_files_dir': OUTPUT_FILES_DIR,
'unique_key': tag
})
ofile = tag + '.tex'
with open(ofile, 'w') as fout:
fout.write(body)
print("created:", ofile)
# the image data which is embedded as base64 in notebook
# will be decoded and returned in `resources`, so write it to file
for filename, data in resources.get('outputs', {}).items():
with open(filename, 'wb') as fres:
fres.write(data)
print("created:", filename)
if len(sys.argv) <= 1:
print("USAGE: this_script [init|yourfile.ipynb]")
elif sys.argv[1] == "init":
init()
else:
doit()

Jupyter does not allow exporting specific cells from a notebook -- it only allows you to export the entire notebook. To get as close to your ideal scenario as possible, you need a modular Jupyter set-up:
Split your single Jupyter notebook into smaller notebooks.
Each notebook can then be exported to LaTeX via File > Download as > LaTeX (.tex)
In LaTeX, you can then import the generated .tex file via
\input{filname.tex}
If you want to import the smaller notebooks into cells of your main notebook, you can do this via (see magic command run)
%run my_other_notebook.ipynb #or %run 'my notebook with spaces.ipynb'
You can also insert python files via (see magic command load)
%load python_file.py
which loads the Python file and allows you to execute it in your main notebook.
You can also have small .py snippets, load them in your small Jupyter notebook, and then run that small notebook in your larger one.
Your use of VS Code is fine, tho, Jupyter in the browser may be faster for you to edit.
(reference for all magic commands)

I would use bookdown to have both test and source code in the same document (split over several files for convenience). This package originates in the R world, but is also usable together with other languages. Here a very simple example:
---
output: bookdown::pdf_document2
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Setup data
First we define some varialbes with data.
```{python data}
bob = ['Bob Smith', 42, 30000, 'software']
sue = ['Sue Jones', 45, 40000, 'music']
```
# Output data
then we output some of the data.
```{python output}
bob[0], sue[2]
```
# Reference code block
Finally lets repeate the code block without evaluating it.
```{python, ref.label="output", eval = FALSE}
```
Output:

running python code, to use sextractor with more than one image file

I have written piece of code which runs sextractor from python, however I only know how to do this for one file, and i need to loop it over 62 files. Im not sure how i would go about doing this. I have attached my code bellow:
#!/usr/bin/env python
# build a catalog using sextractor on des image here
sys.path.append('/home/fitsfiles') #not sure if this does anything/is correct
def sex(image, output, sexdir='/home/sextractor-2.5.0', check_img=None,config=None, l=None) :
'''Construct a sextractor command and run it.'''
#creates a sextractor line e.g sex img.fits -catalog_name -checkimage_name
q="/home/fitsfiles/"+ "01" +".fits"
com = [ "sex ", q, " -CATALOG_NAME " + output]
s0=''
com = s0.join(com)
res = os.system(com)
return res
img_name=sys.argv[0]
output=img_name[0:1]+'_star_catalog.fits'
t=sex(img_name,output)
print '----done !---'
so this code produces a command in my main terminal of, sex /home/fitsfiles/01.fits -CATALOG_NAME g_star_catalog.fits
which successfully produces a star catalogue as I want.
However I want my code to to this for 62 fits files and change the name of star_catalog.fits depending upon which fitsfile is being used. any help would be appreciated.

There are many ways you could approach this. Let's assume you want to run your script as something like
python extract_stars.py /home/fitsfiles/*.fits
Then, you could try something like this:
for arg in len(sys.argv):
filename = arg.split('/')[-1].strip('.fits')
t = sex(arg, filename +'_star_catalog.fits')
# Whatever else
This assumes that you remove the line in sex that reformats the input filename. Also, you do not need to append the fits directory to your path.
The alternative approach is, if you do not plan to do anything else in python, you could write a bash script which would really simplify the task.
And, as a side note, you if you had asked this question more generally (ie, I wish to apply a function I wrote to a number of input files) and without reference to a rather uncommonly used application, you would have likely received an answer much more quickly.

The community has now developed some python wrappers which allow you to run sextractor as if it was a python command. These are: pysex, sewpy and astromatic_wrapper.
The good thing about sextractor wrappers is that allow you to write much cleaner code without the need of defining extra functions, invoking os commands or having the configuration files and the outputfiles on your machine. Moreover, the output can be an astropy table, a pandas dataframe or a numpy array.
For your specific case, you could use pysex and do:
import pysex
import glob
filelist = glob.glob('/directory/*.fits')
for fitsfile in filelist:
cat = pysex.run(fitsfile, params=['X_IMAGE', 'Y_IMAGE', 'FLUX_APER'],
conf_args={'PHOT_APERTURES':5})
print cat['FLUX_APER']

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.