I would like to render a pandas dataframe to HTML in the same way as the Jupyter Notebook does it, i.e. with all the bells and wistles like nice looking styling, column highlighting, and column sorting on click.
pandas.to_html outputs just a plain HTML table and requires manual styling etc.
Is the dataframe rendering code used by jupyter available as a standalone module that can be used in any web app?
Also, are the assets such as js/css files decoupled from jupyter so that they can be easily reused?
This works well for me
def getTableHTML(df):
"""
From https://stackoverflow.com/a/49687866/2007153
Get a Jupyter like html of pandas dataframe
"""
styles = [
#table properties
dict(selector=" ",
props=[("margin","0"),
("font-family",'"Helvetica", "Arial", sans-serif'),
("border-collapse", "collapse"),
("border","none"),
# ("border", "2px solid #ccf")
]),
#header color - optional
# dict(selector="thead",
# props=[("background-color","#cc8484")
# ]),
#background shading
dict(selector="tbody tr:nth-child(even)",
props=[("background-color", "#fff")]),
dict(selector="tbody tr:nth-child(odd)",
props=[("background-color", "#eee")]),
#cell spacing
dict(selector="td",
props=[("padding", ".5em")]),
#header cell properties
dict(selector="th",
props=[("font-size", "100%"),
("text-align", "center")]),
]
return (df.style.set_table_styles(styles)).render()
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
getTableHTML(iris)
Some points to clarify first:
Pandas doesn't have anything to do with the styling, and the styling happens to all HTML tables, not only dataframes. That is easily checked by displaying an HTML table in Jupyter (example code at the end of the answer).
It seems that your Jupyter or one of your installed extensions is doing "extra" styling, the default style doesn't include column sorting or column highlighting. There is only odd/even rows coloring and row highlighting (checked on Jupyter source code and my local Jupyter installation). This means that my answer may not include all the styling you want.
The Answer
Is the dataframe rendering code used by jupyter available as a standalone module that can be used in any web app?
Not exactly a standalone module, but all the tables formatting and styling seem to be attached to the rendered_html class. Double-checked that by inspecting the notebook HTML in Firefox.
You can use the .less file linked above directly or copy the required styles to your HTML.
Also, are the assets such as js/css files decoupled from jupyter so that they can be easily reused?
Like any well-designed web project (and actually any software project), the packages and modules are well separated. This means that you can re-use a lot of the code in your project with minimal effort. You can find most of the .less styling files in Jupyter source code here.
An example to check if the styling happens to all HTML tables:
from IPython.display import HTML
HTML('''<table>
<thead><tr><th></th><th>a</th><th>b</th></tr></thead>
<tbody>
<tr><th>0</th><td>1</td><td>3</td></tr>
<tr><th>1</th><td>2</td><td>4</td></tr>
</tbody>
</table>''')
Related
As an accountant, I produce A4 PDF financial reports for clients. The report contains a PDF cover page design, table of contents, blocks of text and many tables of financial data.
To date I have used a mixture of Microsoft Excel and Word to produce these reports, then save as PDF and add the PDF cover. The major disadvantages to this are that I have to manually edit the tables, I would much rather create automated reports based off existing data exported from my accounting software.
I would like to move away from Excel-Word and move towards (semi-) automating this through python (potentially pandas and markdown packages) - with markdown or html.
Previously I used LaTeX to produce these reports, however I found LaTeX challenging if something went wrong, the errors are difficult to understand and even basic table production can be challenging.
I am trying to plan out how I could bring together python-markdown-html/css. I was wondering if anyone else had experience in producing A4 reports in this way and any advice that they could offer. Initially I was drawn to having text saved as .md files and data stored in either mongoDB, pandas dataframes or simply CSV. I would then use the combination of .md and the data to produce a complete report in HTML. However, could HTML be converted into A4 PDF easily? I understand that there are now page CSS functionality for printing, but is this applicable? How would you suggest I can automate the creation of A4 PDF reports?
To answer your questions plainly:
However, could HTML be converted into A4 PDF easily?
Yes, this is possible using pandoc.
I understand that there are now page CSS functionality for printing, but is this applicable?
Not needed if you use a pandoc template, but possible if desired.
How would you suggest I can automate the creation of A4 PDF reports?
I suggest using pandoc and pandoc templates. This will allow you to convert from a file containing a mixture of makdown, latex, html, and whatever else you would like directly into a pdf.
More details on how:
Pandoc is a document conversion tool that can do this job very well. It will allow you to convert from html or markdown or LaTeX or a mix of all 3 into pdf or a number of other desired formats. For additional control on how the output looks, you can use a pandoc template. You can find information on how to create a custom template here. Here is an example of how that command works:
pandoc /filepath/doc_name.md -o doc_name.pdf --template /file_path/pandoc-templates/article.latex
This process can automated with some further effort. You could do something such as write some python code to generate your graphs or tables from source csv files, then have that code call your pandoc command and build a document.
Here is how I convert my ipython files with graphics outputs and tables into nice looking PDF files, hiding the code segments:
First install jupyter_contrib_nbextensions with
pip install jupyter_contrib_nbextensions
and wkhtmltopdf library from:
https://wkhtmltopdf.org/downloads.html for example I use macos so I had to install
wkhtmltox-0.12.6-2.macos-cocoa.pkg
from this site.
Now convert your file outputs to HTML hiding your code:
jupyter nbconvert --no-input --to html A4_REPORT.ipynb
(A4_REPORT.ipynb is the file you should already have prepared generating some kind of a table, graph or have contained inline markdown segments and able to run in jupyter notebook)
Now convert your this HTML to PDF:
wkhtmltopdf A4_REPORT.html A4_REPORT.pdf
DONE !
If I have a python function that can take text, parse it, and generate formatted HTML, (or re-formatted text), as output, is their any way of adding that as a custom cell format to Jupyter?
I would like to create a custom markup format for register definitions and haveit displayed as pretty HTML/SVG but have the source remain text.
Thanks
EXTRA: I read a biy more and although I see input cells that can go on to generate HTML output, there seems to be nothing that allows the output to hide the input, in the same way that Markdown HTML replaces its source when not editing.
Here is a combination of answers that I should get you what you're looking for. Using this answer as a guide, you can have IPython output HTML using display:
from IPython.core.display import display, HTML
html_custom = '<h1>%s</h1>' % 'Whatever you want'
display(HTML(html_custom))
That allows you to use python to read in whatever text you need to and format it as you need.
Next step is to hide the input. The nbextensions notebook extensions give you a lot of functionality within the notebook and was suggested here. One of the available extensions is Hide input, which as the name suggests, hides the input of a cell. The collapsed state is even maintained within the notebook metadata, so it displays collapsed as you'd expect when reopening the notebook.
Then within the notebook:
I am just getting started with Jupyter Notebook and I'm running into an issue when exporting.
In my current notebook, I alternate between code cells with code and markdown cells. (Which explain my code).
In the markdown cells, sometimes I will use a little HTML to display a table or a list. I will also use the bold tag <b></b> to emphasize a particular portion of text.
My problem is, when I export this notebook to PDF (via the menu in Jupyter Notebook) all of my HTML gets saved as plaintext.
For example, instead of displaying a table, when exporting to PDF, the HTML will be displayed instead. <tr>Table<tr> <th>part1</th>, etc.
I've tried exporting to HTML instead, but even the HTML file displays the HTML as plaintext.
I tried downloading nbconvert (which is probably what I'm doing when I use the jupter GUI anyways) and using that via terminal, but I still get the same result.
Has anyone run into this problem before?
I tried to export it to html and it worked normally.
Where did you define your html? Did you used the Markdown textfields?
Alternatives:
I don't have the nbconverter, but what about exporting it to html and use another tool to convert it to a pdf?
Use markdown language, it provides tables. Link:
https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet
Consider upgrading your notebook
I fixed this myself.
It turns out that somewhere in the code, there was a tag.
Although it did not run the entire length of the cell, the fact that the plaintext tag was there at all changed the dynamic of the cell.
Next, I had strange formatting errors (Text was of different size and strangely emphasized) when using = as plaintext in the cell. When opening the cell for editing, these = symbols were big bold and blue. This probably has something to do with the markdown language.
This was solved by placing the = on the same line as other text.
I did have to convert the page to HTML, then use a firefox addon to convert to PDF.
Converting to PDF from jupyter notebook uses LaTeX to transcribe the page, and all html is converted to plaintext.
The page appeared as normal with html tables, and normal html in the markdown cell. I just had to be careful with any extraneous tags.
If anyone else encounters this problem, check your html tags, and make sure that you are not accidentally doing something in markdown language.
In Jupyter Notebook, via "File- Download as - PDF via LaTeX (.pdf)", I downloaded my notebook as a pdf file. However, many of my code blocks get printed outside of the PDF page margins - i.e. for longer code lines, they get cut out at the pdf page right border. Any way to fix this so that I can have a readable PDF doc (other than manually add hard returns for each line or the way suggested in this post? Thanks!
I was having the same problem. Ultimately, I found the answer at:
http://www.markus-beuckelmann.de/blog/customizing-nbconvert-pdf.html
Basically, it involves adding a custom latex template that wraps lines. He also adjusts some of the font sizes so that the need for wrapping is less.
I did find one bug. His code is missing the final end macro line:
((*- endmacro *))
I put the file in ~/anaconda2/lib/python2.7/site-packages/nbconvert/templates/latex . The location will vary depending on your installation. I'm using anaconda 4.3.1 (Jupyter 4.3.1 as well?)
Actually what I did was rename the existing article.tplx, changed the "extends 'article.tplx' " reflect the new name, and wrote the new template as article.tplx . That way I can change the templates without having to restart the server.
I am looking at combining matplotlib and jinja2 to produce html pages.
What I do now is just including an image previously produced by matplotlib as a reference in my html page. The result is really static.
I've seen related questions like here or here, but none related to matplotlib/jinja integration (ultimately I'd like interactivity but It does not seem to be simple enough for me).
Is there any alternative to what I do ?
Ipython notebook from v1.x uses jinja2.
You can create inline images in the notebook and then convert it to a static html with nbconvert. You can provide custom css and you can customise the output html to hide code cells, cell numbers, ...