Coloring Cells in Pandas - python

I am able to import data from an excel file using Pandas by using:
xl = read_excel('path_to_file.xls', 'Sheet1', index_col=None, na_values=['NA'])
Now that I have all the data in xl as DataFrame. I would like to colour some cells in that data based on conditions defined in another function and export the same (with colour coding) to an Excel file.
Can someone tell me how should I go about this?
Thank you.

Pandas has a relatively new Styler feature where you can apply conditional formatting type manipulations to dataframes.
http://pandas.pydata.org/pandas-docs/stable/style.html
You can use some of their built-in functions like background_gradient or bar to replicate excel-like features like conditional formatting and data bars. You can also format cells to display percentages, floats, ints, etc. without changing the original dataframe.
Here's an example of the type of chart you can make using Styler (this is a nonsense chart but just meant to demonstrate features):
To harness the full functionality of Styler you should get comfortable with the Styler.apply() and Styler.applymap() APIs. These allow you to create custom functions and apply them to the table's columns, rows or elements. For example, if I wanted to color a +ive cell green and a -ive cell red, I'd create a function
def _color_red_or_green(val):
color = 'red' if val < 0 else 'green'
return 'color: %s' % color
and call it on my Styler object, i.e., df.style.applymap(_color_red_or_green).
With respect to exporting back to Excel, as far as I'm aware this is not supported in Styler yet so I'd probably go the xlsxwriter route if you NEED Excel for some reason. However, in my experience this is a great pure Python alternative, for example along with matplotlib charts and in emails/reports.

The most simple way is to use applymap and lambda if you only want to highlight certain values:
df.style.applymap(lambda x: "background-color: red" if x>0 else "background-color: white")

There are quite a few ideas about styling the cells on the Pandas website.
However it ist mentioned: This is a new feature and still under development. We'll be adding features and possibly making breaking changes in future releases

try something like this:
with pandas.io.excel.ExcelWriter(path=Path, engine="xlsxwriter") as writer:
sheet = writer.book.worksheets()[0]
sheet.write(x, y, value, format) #format is what determines the color etc.
More info here: https://xlsxwriter.readthedocs.org/format.html

Related

How do I color flagged rows in an excel file exported from a pandas dataframe?

I am cleaning some data on an excel sheet using pandas dataframes, and I have a list of flagged indices. Is there a way to color these specific rows red or yellow in pandas itself before exporting?
The flagged indices are is a list: [30556, 30981, 55893... etc]
I'm not sure what to try here as I'm new to Data Analysis with Python, only done a beginners course on EdX, so any help would be appreciated :)
The word you wanted to Google for was "styling".
Copious documentation is available.
https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html
Briefly excerpting from that,
one can define a styler
def make_pretty(styler):
styler.background_gradient(axis=None, vmin=1, vmax=5, cmap="YlGnBu")
return styler
and use it to e.g. map increasing values
to background colors that are increasingly blue.
Many additional styling methods are provided.
weather_df.loc[start:end].style.pipe(make_pretty)

Is there a way to display all the values in a .xlsx spreadsheet for a specific colour in a pandas DataFrame besides styleframe?

I have a Microsoft Excel spreadsheet (screenshot) that I'm trying to format using the pandas library in Python, but I can't seem to find any way that I can select only the cells that have a specific colour (blue, for instance). So far, I've tried using both the styleframe and openpyxl library but none have worked for me without any errors.
With styleframe (using this implementation), I believe I can only find specific basic colours that the library supports in its utils module (here). However, my spreadsheet has more advanced colour codes, which styleframe is unable to find, giving me an empty DataFrame as the output.
Empty DataFrame
Columns: []
Index: []
Code:
def find_bs_cs_2021(cell):
return cell if cell.style.bg_color in {utils.colors.dark_yellow, 'FFB740'} else np.nan
def main():
styleframe_dataframe=StyleFrame.read_excel('TimeTable, FSC, Fall-2022.xlsx', sheet_name='Monday', read_style=True, use_openpyxl_styles=False)
find=StyleFrame(styleframe_dataframe.applymap(find_bs_cs_2021).dropna(how='all').dropna(how='all', axis=1))
print(find)
Is there any way to select these cells that styleframe doesn't support, either by using the library or any other library? Eventually, what I want to do is find all the values in the spreadsheet with a specific colour, along with their indexes and column names. I'd highly appreciate any assistance regarding this! :)

How to not escape characters when using pandas styles

If I want to add html inside a normal dataframe, I can do
df.to_html(escape=False)
To ensure special characters are not escaped.
On the other hand if I want to use styles, I do
df.style.background_gradient(cmap='Blues').render()
How can I have both?
The render method seem to accept escape=False, but it doesn't do anything.
Additionally, my requirements are such that I would like to:
have the gradient be applied on the original df
be able to change some individual cells afterwards (specifically, I would like to make some cells clickable by surrounding them with <a onclick="...">...</a>
Anyone knows how to do this?
EDIT
Here is an example
import pandas as pd
df = pd.DataFrame([{'i': i*i } for i in range(10)])
df['clickable'] = df['i'].apply(lambda i: f"""<a onClick="alert('you pressed ' + {i})")>Click for {i}</a>""")
df.style.background_gradient(cmap='PuBu')
In the example above, I managed to get the 'clickable' column to be clickable. But I would like the 'i' column to be clickable too, while retaining its style.
I might be wrong, but it seems what you are looking for is something like this:
import pandas as pd
df = pd.DataFrame([{'i': i*i } for i in range(10)])
df.style.background_gradient(cmap='PuBu').format("""<a onClick="alert('{0}')">Click for {0}</a>""", subset=['i'])
This way apply allows you to apply gradients based on values and format allows you to tell styler how you want to render values (everywhere or in specific columns using subset).

How to center align headers and values in a dataframe, and how to drop the index in a dataframe

I have the following dataframe:
import pandas as pd
df = pd.DataFrame({'text': ['foo foo', 'bar bar'],
'number': [1, 2]})
df
How do I center-align both the column titles/headers and the values in a dataframe, and how do I drop the index (the column with the values 0 and 1) in a dataframe?
Found an answer for this. This should do the trick to center-align both headers and values and hiding the index:
df1 = df.style.set_table_styles([dict(selector='th', props=[('text-align', 'center')])])
df1.set_properties(**{'text-align': 'center'}).hide_index()
Try IPython.display
from IPython.display import HTML
HTML(df.to_html(index=False))
Just to clarify the objective - what you want is to modify the display of the dataframe, not the dataframe itself. Of course, in the context of Jupyter, you may not care about the distinction - for example, if the only point of having the dataframe is to display it - but it's helpful to distinguish these things so you can use the right tools for the right thing.
In this case, as you've discovered, the styler gives you control over most aspects of the display of the dataframe - it does that by outputting and rendering html.
So if your dataframe 'looks' like this in Jupyter:
But you want it to look more like this:
you can use the styler to apply any number of styles (you chain them), so that styled HTML is rendered.
df.style.set_table_styles([dict(selector='th', props=[('text-align', 'center')])])\
.hide(axis='index')
In Jupyter this will display as I've shown above, but if you wanted to display this elsewhere and wanted the underlying HTML (let's say you're rendering a page in Flask based on this), you can use the .to_html() method, like so:
There are two essential advantages to working in this way:
The data in the dataframe remains in its original state - you haven't changed datatypes or content in anyway
The styler opens up a vast array of tools to make your output look exactly the way you want.
In the context of Jupyter and numbers, this is particularly helpful because you don't have to modify the numbers (e.g. with rounding or string conversion), you just apply a style format and avoid exponential notation when you don't want it.
Here's a modified example, which shows how easy it is to use the styler to 'fix' the Jupyter Pandas numeric display problem:
df.style.set_properties(subset=["Feature", "Value"], **{'text-align': 'center'})

Plot Subset of Dataframe without Being Redundant

a bit of a Python newb here. As a beginner it's easy to learn different functions and methods from training classes but it's another thing to learn how to "best" code in Python.
I have a simple scenario where I'm looking to plot a portion of a dataframe spdf. I only want to plot instances where speed is greater than 0 and use datetime as my X-axis. The way I've managed to get the job done seems awfully redundant to me:
ts = pd.Series(spdf[spdf['speed']>0]['speed'].values, index=spdf[spdf['speed']>0]['datetime'])
ts.dropna().plot(title='SP1 over Time')
Is there a better way to plot this data without specifying the subset of my dataframe twice?
You don't need to build a new Series. You can plot using your original df
df[df['col'] > 0]].plot()
In your case:
spdf[spdf['speed'] > 0].dropna().plot(title='SP1 over Time')
I'm not sure what your spdf object is or how it was created. If you'll often need to plot using the 'datetime' column you can set that to be the index of the df.If you're reading the data from a csv you can do this using the parse_dates keyword argument or it you already have the dfyou can change the index using df.set_index('datetime'). You can use df.info() to see what is currently being used at your index and its datatype.

Categories