I have a pandas dataframe that I want to export to HTML using the to_html() method. Is there a way this can be achieved with the formatters or float_format parameter? How would this look like? Could I create a function that checks if a column contains numbers only and then assign a certain class to it that can then be linked to a css style? Note that I only want columns with numbers only to be right-aligned. All other columns should note be affected. Also, the headers should all be left-aligned.
This regexp will replace the tag of cells containing a float or int.
Afterwards, you should add the relevant style to your page.
import re
re.sub(r"<td>((\d)+?(.(\d)+))", r"<td class='my_class'>\1", df.to_html())
Related
If I want to add html inside a normal dataframe, I can do
df.to_html(escape=False)
To ensure special characters are not escaped.
On the other hand if I want to use styles, I do
df.style.background_gradient(cmap='Blues').render()
How can I have both?
The render method seem to accept escape=False, but it doesn't do anything.
Additionally, my requirements are such that I would like to:
have the gradient be applied on the original df
be able to change some individual cells afterwards (specifically, I would like to make some cells clickable by surrounding them with <a onclick="...">...</a>
Anyone knows how to do this?
EDIT
Here is an example
import pandas as pd
df = pd.DataFrame([{'i': i*i } for i in range(10)])
df['clickable'] = df['i'].apply(lambda i: f"""<a onClick="alert('you pressed ' + {i})")>Click for {i}</a>""")
df.style.background_gradient(cmap='PuBu')
In the example above, I managed to get the 'clickable' column to be clickable. But I would like the 'i' column to be clickable too, while retaining its style.
I might be wrong, but it seems what you are looking for is something like this:
import pandas as pd
df = pd.DataFrame([{'i': i*i } for i in range(10)])
df.style.background_gradient(cmap='PuBu').format("""<a onClick="alert('{0}')">Click for {0}</a>""", subset=['i'])
This way apply allows you to apply gradients based on values and format allows you to tell styler how you want to render values (everywhere or in specific columns using subset).
I have the following dataframe:
import pandas as pd
df = pd.DataFrame({'text': ['foo foo', 'bar bar'],
'number': [1, 2]})
df
How do I center-align both the column titles/headers and the values in a dataframe, and how do I drop the index (the column with the values 0 and 1) in a dataframe?
Found an answer for this. This should do the trick to center-align both headers and values and hiding the index:
df1 = df.style.set_table_styles([dict(selector='th', props=[('text-align', 'center')])])
df1.set_properties(**{'text-align': 'center'}).hide_index()
Try IPython.display
from IPython.display import HTML
HTML(df.to_html(index=False))
Just to clarify the objective - what you want is to modify the display of the dataframe, not the dataframe itself. Of course, in the context of Jupyter, you may not care about the distinction - for example, if the only point of having the dataframe is to display it - but it's helpful to distinguish these things so you can use the right tools for the right thing.
In this case, as you've discovered, the styler gives you control over most aspects of the display of the dataframe - it does that by outputting and rendering html.
So if your dataframe 'looks' like this in Jupyter:
But you want it to look more like this:
you can use the styler to apply any number of styles (you chain them), so that styled HTML is rendered.
df.style.set_table_styles([dict(selector='th', props=[('text-align', 'center')])])\
.hide(axis='index')
In Jupyter this will display as I've shown above, but if you wanted to display this elsewhere and wanted the underlying HTML (let's say you're rendering a page in Flask based on this), you can use the .to_html() method, like so:
There are two essential advantages to working in this way:
The data in the dataframe remains in its original state - you haven't changed datatypes or content in anyway
The styler opens up a vast array of tools to make your output look exactly the way you want.
In the context of Jupyter and numbers, this is particularly helpful because you don't have to modify the numbers (e.g. with rounding or string conversion), you just apply a style format and avoid exponential notation when you don't want it.
Here's a modified example, which shows how easy it is to use the styler to 'fix' the Jupyter Pandas numeric display problem:
df.style.set_properties(subset=["Feature", "Value"], **{'text-align': 'center'})
data set
I am trying to drop the numbers as strings inside of this pandas data frame. The problem is that I don't know of a way to locate them.
df['Country'].unique() returns the what is shown in the image above.
However, '437.2' in df['Country'] returns False.
I would like to be able to create a list or set from 0-9, and search all strings in the column for numbers listed in the list/set, and finally drop the values where this condition is true.
df[~df['Country'].str.contains("\d")]
should give you what you want
I was querying Stackoverflow to get some data (https://data.stackexchange.com/stackoverflow/query/new), and I have a data frame with Tags as a column. The tags originally were of the form
<html><css>
I managed to get them in the form of
html,css
I think an image of my Jupyter notebook can display it best:
How can I separate the tags so that they can become categorical variables, and I can transform them using something like get_dummies?
Everything I've seen refers to actual lists, like [html,css], rather than just comma separated words.
For this purpose, we can use df['Tags'].str.get_dummies(','), which basically performs split and converts each element to its own one-hot encoded column.
I want to add the units of my parameters next to each parameter as the name of a column in my dataframe. I also need to use statistical symbols for some column names such as μ and σ2.
I tried following code according to mathematical symbols in python that is r"$...$ but it does not work for dataframe:
P[r"Infiltration rate ($1/\h^-1$)"]=r['ACH_Base']
in order to give (1/h^-1) unit to Infiltration rate parameter.
In my code I have already created a new dataframe "P" and I am adding the ACH_Base column in "r" dataframe to P.
How can I add mathematical symbols for naming the columns in dataframes?
Thanks!!
It should work, but it depends on the backend used to display the dataframe. For instance, matplotlib has support to render LaTeX in plots.
Here is an example:
https://matplotlib.org/users/usetex.html#text-rendering-with-latex
LaTeX can also be rendered in jupyter notebooks, but this does not apply to Python code, only for markdown cells:
http://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html?highlight=latex#LaTeX-equations
"\h" is an unknown symbol.
Does P[r"Infiltration rate ($1/h^-1$)"]=r['ACH_Base'] work to display what you want?
What unit do you wish to display? You can refer to https://matplotlib.org/users/mathtext.html and https://matplotlib.org/users/usetex.html#usetex-tutorial for more information on how to render text with LaTex.