How to not escape characters when using pandas styles - python

If I want to add html inside a normal dataframe, I can do
df.to_html(escape=False)
To ensure special characters are not escaped.
On the other hand if I want to use styles, I do
df.style.background_gradient(cmap='Blues').render()
How can I have both?
The render method seem to accept escape=False, but it doesn't do anything.
Additionally, my requirements are such that I would like to:
have the gradient be applied on the original df
be able to change some individual cells afterwards (specifically, I would like to make some cells clickable by surrounding them with <a onclick="...">...</a>
Anyone knows how to do this?
EDIT
Here is an example
import pandas as pd
df = pd.DataFrame([{'i': i*i } for i in range(10)])
df['clickable'] = df['i'].apply(lambda i: f"""<a onClick="alert('you pressed ' + {i})")>Click for {i}</a>""")
df.style.background_gradient(cmap='PuBu')
In the example above, I managed to get the 'clickable' column to be clickable. But I would like the 'i' column to be clickable too, while retaining its style.

I might be wrong, but it seems what you are looking for is something like this:
import pandas as pd
df = pd.DataFrame([{'i': i*i } for i in range(10)])
df.style.background_gradient(cmap='PuBu').format("""<a onClick="alert('{0}')">Click for {0}</a>""", subset=['i'])
This way apply allows you to apply gradients based on values and format allows you to tell styler how you want to render values (everywhere or in specific columns using subset).

Related

Python: show rows if there's certain keyword from the list and show what was the detected keyword

I was trying to get a data frame of spam messages so I can analyze them. This is what the original CSV file looks like.
I want it to be like
This is what I had tried:
###import the original CSV (it's simplified sample which has only two columns - sender, text)
import pandas as pd
df = pd.read_csv("spam.csv")
### if any of those is in the text column, I'll put that row in the new data frame.
keyword = ["prize", "bit.ly", "shorturl"]
### putting rows that have a keyword into a new data frame.
spam_list = df[df['text'].str.contains('|'.join(keyword))]
### creating a new column 'detected keyword' and trying to show what was detected keyword
spam_list['detected word'] = keyword
spam_list
However, "detected word" is in order of the list.
I know it's because I put the list into the new column, but I couldn't think/find a better way to do this. Should I have used "for" as the solution? Or am I approaching it in a totally wrong way?
You can define a function that gets the result for each row:
def detect_keyword(row):
for key in keyword:
if key in row['text']:
return key
then get it done for all rows with pandas.apply() and save results as a new column:
df['detected_word'] = df.apply(lambda x: detect_keyword(x), axis=1)
You can use the code given below in the picture to solve your stated problem, I wasn't able to paste the code because stackoverflow wasn't allowing to paste short links. The link to the code is available.
The code has been adapted from here

Pandas right align columns with numbers after to_html()

I have a pandas dataframe that I want to export to HTML using the to_html() method. Is there a way this can be achieved with the formatters or float_format parameter? How would this look like? Could I create a function that checks if a column contains numbers only and then assign a certain class to it that can then be linked to a css style? Note that I only want columns with numbers only to be right-aligned. All other columns should note be affected. Also, the headers should all be left-aligned.
This regexp will replace the tag of cells containing a float or int.
Afterwards, you should add the relevant style to your page.
import re
re.sub(r"<td>((\d)+?(.(\d)+))", r"<td class='my_class'>\1", df.to_html())

How to center align headers and values in a dataframe, and how to drop the index in a dataframe

I have the following dataframe:
import pandas as pd
df = pd.DataFrame({'text': ['foo foo', 'bar bar'],
'number': [1, 2]})
df
How do I center-align both the column titles/headers and the values in a dataframe, and how do I drop the index (the column with the values 0 and 1) in a dataframe?
Found an answer for this. This should do the trick to center-align both headers and values and hiding the index:
df1 = df.style.set_table_styles([dict(selector='th', props=[('text-align', 'center')])])
df1.set_properties(**{'text-align': 'center'}).hide_index()
Try IPython.display
from IPython.display import HTML
HTML(df.to_html(index=False))
Just to clarify the objective - what you want is to modify the display of the dataframe, not the dataframe itself. Of course, in the context of Jupyter, you may not care about the distinction - for example, if the only point of having the dataframe is to display it - but it's helpful to distinguish these things so you can use the right tools for the right thing.
In this case, as you've discovered, the styler gives you control over most aspects of the display of the dataframe - it does that by outputting and rendering html.
So if your dataframe 'looks' like this in Jupyter:
But you want it to look more like this:
you can use the styler to apply any number of styles (you chain them), so that styled HTML is rendered.
df.style.set_table_styles([dict(selector='th', props=[('text-align', 'center')])])\
.hide(axis='index')
In Jupyter this will display as I've shown above, but if you wanted to display this elsewhere and wanted the underlying HTML (let's say you're rendering a page in Flask based on this), you can use the .to_html() method, like so:
There are two essential advantages to working in this way:
The data in the dataframe remains in its original state - you haven't changed datatypes or content in anyway
The styler opens up a vast array of tools to make your output look exactly the way you want.
In the context of Jupyter and numbers, this is particularly helpful because you don't have to modify the numbers (e.g. with rounding or string conversion), you just apply a style format and avoid exponential notation when you don't want it.
Here's a modified example, which shows how easy it is to use the styler to 'fix' the Jupyter Pandas numeric display problem:
df.style.set_properties(subset=["Feature", "Value"], **{'text-align': 'center'})

implement a text classifier with python

i try to implement a Persian text classifier with python, i use excel to read my data and make my data set.
i would be thankful if you have any suggestion about better implementing.
i tried this code to access to body of messages which have my conditions and store them. i took screenshot of my excel file to help more.
for example i want to store body of messages which its col "foolish" (i mean F column) have value of 1(true).
https://ibb.co/DzS1RpY "screenshot"
import pandas as pd
file='1.xlsx'
sorted=pd.read_excel(file,index_col='foolish')
var=sorted[['body']][sorted['foolish']=='1']
print(var.head())
expected result is body of rows 2,4,6,8.
try assigning like this:
df_data=df["body"][df["foolish"]==1.0]
dont use - which is a python operator instead use _ (underscore)
Also note that this will return a series.
For a dataframe , use:
df_data = pd.DataFrame(df['body'][df["foolish"]==1.0])

Coloring Cells in Pandas

I am able to import data from an excel file using Pandas by using:
xl = read_excel('path_to_file.xls', 'Sheet1', index_col=None, na_values=['NA'])
Now that I have all the data in xl as DataFrame. I would like to colour some cells in that data based on conditions defined in another function and export the same (with colour coding) to an Excel file.
Can someone tell me how should I go about this?
Thank you.
Pandas has a relatively new Styler feature where you can apply conditional formatting type manipulations to dataframes.
http://pandas.pydata.org/pandas-docs/stable/style.html
You can use some of their built-in functions like background_gradient or bar to replicate excel-like features like conditional formatting and data bars. You can also format cells to display percentages, floats, ints, etc. without changing the original dataframe.
Here's an example of the type of chart you can make using Styler (this is a nonsense chart but just meant to demonstrate features):
To harness the full functionality of Styler you should get comfortable with the Styler.apply() and Styler.applymap() APIs. These allow you to create custom functions and apply them to the table's columns, rows or elements. For example, if I wanted to color a +ive cell green and a -ive cell red, I'd create a function
def _color_red_or_green(val):
color = 'red' if val < 0 else 'green'
return 'color: %s' % color
and call it on my Styler object, i.e., df.style.applymap(_color_red_or_green).
With respect to exporting back to Excel, as far as I'm aware this is not supported in Styler yet so I'd probably go the xlsxwriter route if you NEED Excel for some reason. However, in my experience this is a great pure Python alternative, for example along with matplotlib charts and in emails/reports.
The most simple way is to use applymap and lambda if you only want to highlight certain values:
df.style.applymap(lambda x: "background-color: red" if x>0 else "background-color: white")
There are quite a few ideas about styling the cells on the Pandas website.
However it ist mentioned: This is a new feature and still under development. We'll be adding features and possibly making breaking changes in future releases
try something like this:
with pandas.io.excel.ExcelWriter(path=Path, engine="xlsxwriter") as writer:
sheet = writer.book.worksheets()[0]
sheet.write(x, y, value, format) #format is what determines the color etc.
More info here: https://xlsxwriter.readthedocs.org/format.html

Categories