Copying the styling results of one dataframe to another - python

I have two dataframes of identical size, df_a and df_b. df_a only contains numbers and is styled using a colormap (either using xlsxwriter or pandas styling). df_b contains mixed values. Is it possible to copy the resulting style from one dataframe to the other such that the background color of df_b[i,j] equals the background color of df_a[i,j]?
In the example below the first element of df_b should be red, the second yellow, and so on.
Formatting code for df_a (based on the xlsxwriter example)
# Create a Pandas dataframe from some data.
df_a = pd.DataFrame({'col_a': [10, 20, 30, 20, 15, 30, 45]})
df_b = pd.DataFrame({'col_b': ['a', 'a', 'c', 'd', 'e', 'f', 'g']})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_conditional.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df_a.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Apply a conditional format to the cell range.
worksheet.conditional_format('B2:C8', {'type': '3_color_scale'})
# Close the Pandas Excel writer and output the Excel file.
writer.save()

Let's use Pandas' style to format the data:
# This will color every column into 3-category color
def style(df):
s = {}
for c1, c2 in zip(df.columns, df_a.columns):
s[c1] = pd.cut(df_a[c2], bins=3,
labels=[f'background-color:{c}' for c in ['red','blue','green']])
return pd.DataFrame(s)
writer = pd.ExcelWriter(output_file)
df_a.style.apply(style, axis=None).to_excel(writer, sheet_name='df_a', index=False)
df_b.style.apply(style, axis=None).to_excel(writer, sheet_name='df_b', index=False)
writer.save()

This solution expands the accepted solution to more than one column and uses a color map instead of a predefined limited set of colors.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib
np.random.seed(24)
df_a = pd.DataFrame(10*np.random.randn(10, 4), columns=list('ABCD'))
df_b= pd.DataFrame(np.random.randn(10, 4), columns=list('ABCD'))
cm = sns.color_palette("Spectral", as_cmap=True)
def style(df):
data = df_a.values
data_normalized = (data - np.min(data)) / (np.max(data) - np.min(data))
bg_colors = cm(data_normalized)
bg_colors = np.apply_along_axis(matplotlib.colors.to_hex, 2, bg_colors)
s = pd.DataFrame(bg_colors, columns=df.columns)
for col in s.columns:
s[col] = s[col].apply(lambda c: f'background-color:{c}')
return s
writer = pd.ExcelWriter('temp.xlsx')
df_a.style.apply(style, axis=None).to_excel(writer, sheet_name='df_a', index=False)
df_b.style.apply(style, axis=None).to_excel(writer, sheet_name='df_b', index=False)
writer.save()
This results in the following:

Related

Write a pandas dataframe into an existing excel file [duplicate]

I am having trouble updating an Excel Sheet using pandas by writing new values in it. I already have an existing frame df1 that reads the values from MySheet1.xlsx. so this needs to either be a new dataframe or somehow to copy and overwrite the existing one.
The spreadsheet is in this format:
I have a python list: values_list = [12.34, 17.56, 12.45]. My goal is to insert the list values under Col_C header vertically. It is currently overwriting the entire dataframe horizontally, without preserving the current values.
df2 = pd.DataFrame({'Col_C': values_list})
writer = pd.ExcelWriter('excelfile.xlsx', engine='xlsxwriter')
df2.to_excel(writer, sheet_name='MySheet1')
workbook = writer.book
worksheet = writer.sheets['MySheet1']
How to get this end result? Thank you!
Below I've provided a fully reproducible example of how you can go about modifying an existing .xlsx workbook using pandas and the openpyxl module (link to Openpyxl Docs).
First, for demonstration purposes, I create a workbook called test.xlsx:
from openpyxl import load_workbook
import pandas as pd
writer = pd.ExcelWriter('test.xlsx', engine='openpyxl')
wb = writer.book
df = pd.DataFrame({'Col_A': [1,2,3,4],
'Col_B': [5,6,7,8],
'Col_C': [0,0,0,0],
'Col_D': [13,14,15,16]})
df.to_excel(writer, index=False)
wb.save('test.xlsx')
This is the Expected output at this point:
In this second part, we load the existing workbook ('test.xlsx') and modify the third column with different data.
from openpyxl import load_workbook
import pandas as pd
df_new = pd.DataFrame({'Col_C': [9, 10, 11, 12]})
wb = load_workbook('test.xlsx')
ws = wb['Sheet1']
for index, row in df_new.iterrows():
cell = 'C%d' % (index + 2)
ws[cell] = row[0]
wb.save('test.xlsx')
This is the Expected output at the end:
In my opinion, the easiest solution is to read the excel as a panda's dataframe, and modify it and write out as an excel. So for example:
Comments:
Import pandas as pd.
Read the excel sheet into pandas data-frame called.
Take your data, which could be in a list format, and assign it to the column you want. (just make sure the lengths are the same). Save your data-frame as an excel, either override the old excel or create a new one.
Code:
import pandas as pd
ExcelDataInPandasDataFrame = pd.read_excel("./YourExcel.xlsx")
YourDataInAList = [12.34,17.56,12.45]
ExcelDataInPandasDataFrame ["Col_C"] = YourDataInAList
ExcelDataInPandasDataFrame .to_excel("./YourNewExcel.xlsx",index=False)

Custom Excel column using pandas [duplicate]

I am being asked to generate some Excel reports. I am currently using pandas quite heavily for my data, so naturally I would like to use the pandas.ExcelWriter method to generate these reports. However the fixed column widths are a problem.
The code I have so far is simple enough. Say I have a dataframe called df:
writer = pd.ExcelWriter(excel_file_path, engine='openpyxl')
df.to_excel(writer, sheet_name="Summary")
I was looking over the pandas docs, and I don't really see any options to set column widths. Is there a trick to make it such that the columns auto-adjust to the data? Or is there something I can do after the fact to the xlsx file to adjust the column widths?
(I am using the OpenPyXL library, and generating .xlsx files - if that makes any difference.)
Inspired by user6178746's answer, I have the following:
# Given a dict of dataframes, for example:
# dfs = {'gadgets': df_gadgets, 'widgets': df_widgets}
writer = pd.ExcelWriter(filename, engine='xlsxwriter')
for sheetname, df in dfs.items(): # loop through `dict` of dataframes
df.to_excel(writer, sheet_name=sheetname) # send df to writer
worksheet = writer.sheets[sheetname] # pull worksheet object
for idx, col in enumerate(df): # loop through all columns
series = df[col]
max_len = max((
series.astype(str).map(len).max(), # len of largest item
len(str(series.name)) # len of column name/header
)) + 1 # adding a little extra space
worksheet.set_column(idx, idx, max_len) # set column width
writer.save()
Dynamically adjust all the column lengths
writer = pd.ExcelWriter('/path/to/output/file.xlsx')
df.to_excel(writer, sheet_name='sheetName', index=False, na_rep='NaN')
for column in df:
column_length = max(df[column].astype(str).map(len).max(), len(column))
col_idx = df.columns.get_loc(column)
writer.sheets['sheetName'].set_column(col_idx, col_idx, column_length)
writer.save()
Manually adjust a column using Column Name
col_idx = df.columns.get_loc('columnName')
writer.sheets['sheetName'].set_column(col_idx, col_idx, 15)
Manually adjust a column using Column Index
writer.sheets['sheetName'].set_column(col_idx, col_idx, 15)
In case any of the above is failing with
AttributeError: 'Worksheet' object has no attribute 'set_column'
make sure to install xlsxwriter:
pip install xlsxwriter
For a more comprehensive explanation you can read the article How to Auto-Adjust the Width of Excel Columns with Pandas ExcelWriter on TDS.
I'm posting this because I just ran into the same issue and found that the official documentation for Xlsxwriter and pandas still have this functionality listed as unsupported. I hacked together a solution that solved the issue i was having. I basically just iterate through each column and use worksheet.set_column to set the column width == the max length of the contents of that column.
One important note, however. This solution does not fit the column headers, simply the column values. That should be an easy change though if you need to fit the headers instead. Hope this helps someone :)
import pandas as pd
import sqlalchemy as sa
import urllib
read_server = 'serverName'
read_database = 'databaseName'
read_params = urllib.quote_plus("DRIVER={SQL Server};SERVER="+read_server+";DATABASE="+read_database+";TRUSTED_CONNECTION=Yes")
read_engine = sa.create_engine("mssql+pyodbc:///?odbc_connect=%s" % read_params)
#Output some SQL Server data into a dataframe
my_sql_query = """ SELECT * FROM dbo.my_table """
my_dataframe = pd.read_sql_query(my_sql_query,con=read_engine)
#Set destination directory to save excel.
xlsFilepath = r'H:\my_project' + "\\" + 'my_file_name.xlsx'
writer = pd.ExcelWriter(xlsFilepath, engine='xlsxwriter')
#Write excel to file using pandas to_excel
my_dataframe.to_excel(writer, startrow = 1, sheet_name='Sheet1', index=False)
#Indicate workbook and worksheet for formatting
workbook = writer.book
worksheet = writer.sheets['Sheet1']
#Iterate through each column and set the width == the max length in that column. A padding length of 2 is also added.
for i, col in enumerate(my_dataframe.columns):
# find length of column i
column_len = my_dataframe[col].astype(str).str.len().max()
# Setting the length if the column header is larger
# than the max column value length
column_len = max(column_len, len(col)) + 2
# set the column length
worksheet.set_column(i, i, column_len)
writer.save()
There is a nice package that I started to use recently called StyleFrame.
it gets DataFrame and lets you to style it very easily...
by default the columns width is auto-adjusting.
for example:
from StyleFrame import StyleFrame
import pandas as pd
df = pd.DataFrame({'aaaaaaaaaaa': [1, 2, 3],
'bbbbbbbbb': [1, 1, 1],
'ccccccccccc': [2, 3, 4]})
excel_writer = StyleFrame.ExcelWriter('example.xlsx')
sf = StyleFrame(df)
sf.to_excel(excel_writer=excel_writer, row_to_add_filters=0,
columns_and_rows_to_freeze='B2')
excel_writer.save()
you can also change the columns width:
sf.set_column_width(columns=['aaaaaaaaaaa', 'bbbbbbbbb'],
width=35.3)
UPDATE 1
In version 1.4 best_fit argument was added to StyleFrame.to_excel.
See the documentation.
UPDATE 2
Here's a sample of code that works for StyleFrame 3.x.x
from styleframe import StyleFrame
import pandas as pd
columns = ['aaaaaaaaaaa', 'bbbbbbbbb', 'ccccccccccc', ]
df = pd.DataFrame(data={
'aaaaaaaaaaa': [1, 2, 3, ],
'bbbbbbbbb': [1, 1, 1, ],
'ccccccccccc': [2, 3, 4, ],
}, columns=columns,
)
excel_writer = StyleFrame.ExcelWriter('example.xlsx')
sf = StyleFrame(df)
sf.to_excel(
excel_writer=excel_writer,
best_fit=columns,
columns_and_rows_to_freeze='B2',
row_to_add_filters=0,
)
excel_writer.save()
There is probably no automatic way to do it right now, but as you use openpyxl, the following line (adapted from another answer by user Bufke on how to do in manually) allows you to specify a sane value (in character widths):
writer.sheets['Summary'].column_dimensions['A'].width = 15
By using pandas and xlsxwriter you can do your task, below code will perfectly work in Python 3.x. For more details on working with XlsxWriter with pandas this link might be useful https://xlsxwriter.readthedocs.io/working_with_pandas.html
import pandas as pd
writer = pd.ExcelWriter(excel_file_path, engine='xlsxwriter')
df.to_excel(writer, sheet_name="Summary")
workbook = writer.book
worksheet = writer.sheets["Summary"]
#set the column width as per your requirement
worksheet.set_column('A:A', 25)
writer.save()
I found that it was more useful to adjust the column with based on the column header rather than column content.
Using df.columns.values.tolist() I generate a list of the column headers and use the lengths of these headers to determine the width of the columns.
See full code below:
import pandas as pd
import xlsxwriter
writer = pd.ExcelWriter(filename, engine='xlsxwriter')
df.to_excel(writer, index=False, sheet_name=sheetname)
workbook = writer.book # Access the workbook
worksheet= writer.sheets[sheetname] # Access the Worksheet
header_list = df.columns.values.tolist() # Generate list of headers
for i in range(0, len(header_list)):
worksheet.set_column(i, i, len(header_list[i])) # Set column widths based on len(header)
writer.save() # Save the excel file
At work, I am always writing the dataframes to excel files. So instead of writing the same code over and over, I have created a modulus. Now I just import it and use it to write and formate the excel files. There is one downside though, it takes a long time if the dataframe is extra large.
So here is the code:
def result_to_excel(output_name, dataframes_list, sheet_names_list, output_dir):
out_path = os.path.join(output_dir, output_name)
writerReport = pd.ExcelWriter(out_path, engine='xlsxwriter',
datetime_format='yyyymmdd', date_format='yyyymmdd')
workbook = writerReport.book
# loop through the list of dataframes to save every dataframe into a new sheet in the excel file
for i, dataframe in enumerate(dataframes_list):
sheet_name = sheet_names_list[i] # choose the sheet name from sheet_names_list
dataframe.to_excel(writerReport, sheet_name=sheet_name, index=False, startrow=0)
# Add a header format.
format = workbook.add_format({
'bold': True,
'border': 1,
'fg_color': '#0000FF',
'font_color': 'white'})
# Write the column headers with the defined format.
worksheet = writerReport.sheets[sheet_name]
for col_num, col_name in enumerate(dataframe.columns.values):
worksheet.write(0, col_num, col_name, format)
worksheet.autofilter(0, 0, 0, len(dataframe.columns) - 1)
worksheet.freeze_panes(1, 0)
# loop through the columns in the dataframe to get the width of the column
for j, col in enumerate(dataframe.columns):
max_width = max([len(str(s)) for s in dataframe[col].values] + [len(col) + 2])
# define a max width to not get to wide column
if max_width > 50:
max_width = 50
worksheet.set_column(j, j, max_width)
writerReport.save()
return output_dir + output_name
Combining the other answers and comments and also supporting multi-indices:
def autosize_excel_columns(worksheet, df):
autosize_excel_columns_df(worksheet, df.index.to_frame())
autosize_excel_columns_df(worksheet, df, offset=df.index.nlevels)
def autosize_excel_columns_df(worksheet, df, offset=0):
for idx, col in enumerate(df):
series = df[col]
max_len = max((
series.astype(str).map(len).max(),
len(str(series.name))
)) + 1
worksheet.set_column(idx+offset, idx+offset, max_len)
sheetname=...
df.to_excel(writer, sheet_name=sheetname, freeze_panes=(df.columns.nlevels, df.index.nlevels))
worksheet = writer.sheets[sheetname]
autosize_excel_columns(worksheet, df)
writer.save()
you can solve the problem by calling the following function, where df is the dataframe you want to get the sizes and the sheetname is the sheet in excel where you want the modifications to take place
def auto_width_columns(df, sheetname):
workbook = writer.book
worksheet= writer.sheets[sheetname]
for i, col in enumerate(df.columns):
column_len = max(df[col].astype(str).str.len().max(), len(col) + 2)
worksheet.set_column(i, i, column_len)
import re
import openpyxl
..
for col in _ws.columns:
max_lenght = 0
print(col[0])
col_name = re.findall('\w\d', str(col[0]))
col_name = col_name[0]
col_name = re.findall('\w', str(col_name))[0]
print(col_name)
for cell in col:
try:
if len(str(cell.value)) > max_lenght:
max_lenght = len(cell.value)
except:
pass
adjusted_width = (max_lenght+2)
_ws.column_dimensions[col_name].width = adjusted_width
Yes, there is there is something you can do subsequently to the xlsx file to adjust the column widths.
Use xlwings to autofit columns. It's a pretty simple solution, see the 6 last lines of the example code. The advantage of this procedure is that you don't have to worry about font size, font type or anything else.
Requirement: Excel installation.
import pandas as pd
import xlwings as xw
path = r"test.xlsx"
# Export your dataframe in question.
df = pd._testing.makeDataFrame()
df.to_excel(path)
# Autofit all columns with xlwings.
with xw.App(visible=False) as app:
wb = xw.Book(path)
for ws in wb.sheets:
ws.autofit(axis="columns")
wb.save(path)
wb.close()
Easiest solution is to specify width of column in set_column method.
for worksheet in writer.sheets.values():
worksheet.set_column(0,last_column_value, required_width_constant)
This function works for me, also fixes the index width
def write_to_excel(writer, X, sheet_name, sep_only=False):
#writer=writer object
#X=dataframe
#sheet_name=name of sheet
#sep_only=True:write only as separate excel file, False: write as sheet to the writer object
if sheet_name=="":
print("specify sheet_name!")
else:
X.to_excel(f"{output_folder}{prefix_excel_save}_{sheet_name}.xlsx")
if not sep_only:
X.to_excel(writer, sheet_name=sheet_name)
#fix column widths
worksheet = writer.sheets[sheet_name] # pull worksheet object
for idx, col in enumerate(X.columns): # loop through all columns
series = X[col]
max_len = max((
series.astype(str).map(len).max(), # len of largest item
len(str(series.name)) # len of column name/header
)) + 1 # adding a little extra space
worksheet.set_column(idx+1, idx+1, max_len) # set column width (=1 because index = 1)
#fix index width
max_len=pd.Series(X.index.values).astype(str).map(len).max()+1
worksheet.set_column(0, 0, max_len)
if sep_only:
print(f'{sheet_name} is written as seperate file')
else:
print(f'{sheet_name} is written as seperate file')
print(f'{sheet_name} is written as sheet')
return writer
call example:
writer = write_to_excel(writer, dataframe, "Statistical_Analysis")
I may be a bit late to the party but this code works when using 'openpyxl' as your engine, sometimes pip install xlsxwriter wont solve the issue. This code below works like a charm. Edit any part as you wish.
def text_length(text):
"""
Get the effective text length in characters, taking into account newlines
"""
if not text:
return 0
lines = text.split("\n")
return max(len(line) for line in lines)
def _to_str_for_length(v, decimals=3):
"""
Like str() but rounds decimals to predefined length
"""
if isinstance(v, float):
# Round to [decimal] places
return str(Decimal(v).quantize(Decimal('1.' + '0' * decimals)).normalize())
else:
return str(v)
def auto_adjust_xlsx_column_width(df, writer, sheet_name, margin=3, length_factor=1.0, decimals=3, index=False):
sheet = writer.sheets[sheet_name]
_to_str = functools.partial(_to_str_for_length, decimals=decimals)
# Compute & set column width for each column
for column_name in df.columns:
# Convert the value of the columns to string and select the
column_length = max(df[column_name].apply(_to_str).map(text_length).max(), text_length(column_name)) + 5
# Get index of column in XLSX
# Column index is +1 if we also export the index column
col_idx = df.columns.get_loc(column_name)
if index:
col_idx += 1
# Set width of column to (column_length + margin)
sheet.column_dimensions[openpyxl.utils.cell.get_column_letter(col_idx + 1)].width = column_length * length_factor + margin
# Compute column width of index column (if enabled)
if index: # If the index column is being exported
index_length = max(df.index.map(_to_str).map(text_length).max(), text_length(df.index.name))
sheet.column_dimensions["A"].width = index_length * length_factor + margin
An openpyxl version based on #alichaudry's code.
The code 1) loads an excel file, 2) adjusts column widths and 3) saves it.
def auto_adjust_column_widths(excel_file : "Excel File Path", extra_space = 1) -> None:
"""
Adjusts column widths of the excel file and replaces it with the adjusted one.
Adjusting columns is based on the lengths of columns values (including column names).
Parameters
----------
excel_file :
excel_file to adjust column widths.
extra_space :
extra column width in addition to the value-based-widths
"""
from openpyxl import load_workbook
from openpyxl.utils import get_column_letter
wb = load_workbook(excel_file)
for ws in wb:
df = pd.DataFrame(ws.values,)
for i,r in (df.astype(str).applymap(len).max(axis=0) + extra_space).iteritems():
ws.column_dimensions[get_column_letter(i+1)].width = r
wb.save(excel_file)

Pandas dataframe. Add an aditional row header merging all columns

I want to add a "second" header to my excel using pandas dataframe.
The excel has his values and header. But I want to add a new row above the header with just one column (the size of all columns header). And text centered.
Something like this:
How can I do this?
Use MultiIndex.from_product, but text is not centered:
df.columns = pd.MultiIndex.from_product([['Result'], df.columns])
EDIT:
import string
# Creating a DataFrame
df = pd.DataFrame(np.random.randn(8, 6), columns=list('ABCDEF'))
# Create a Pandas Excel writer using XlsxWriter engine.
writer = pd.ExcelWriter("test.xlsx", engine='xlsxwriter')
# Create custom style
df.to_excel(writer, sheet_name='Sheet1', startrow=1, index=False)
# Get workbook and worksheet objects
workbook = writer.book
worksheet = writer.sheets['Sheet1']
merge_format = workbook.add_format({'align': 'center'})
len_cols = len(df.columns)
#set merge_range by length of colums names
len_cols = len(df.columns)
worksheet.merge_range(0, 0, 0, len_cols - 1, 'Result', merge_format)
writer.save()

Python: xlsxwriter highlight cells by range without condition

I have a dataframe with 3 columns.
I like to highlight column a as orange, column b as green, column c as yellow but controlled by end of row.
using xlsxwriter I found examples for highlighting the entire column with ".add_format" but I didn't want the entire column to be highlighted.
How can I use xlsxwriter to highlight specific cells without using ".conditional_format"?
df = {'a': ['','',''],
'b':[1,2,2]
'c':[1,2,2]}
With xlsxwriter i am applying format using 2 different ways. Mainly with the function set_column (if you don't mind the format expanding until the end of the file) and using for loops if i do not want the format to expand until the end of the file (for example borderlines and background colors).
So this is how you can apply format to your dataframe:
import pandas as pd
# Create a test df
data = {'a': ['','',''], 'b': [1,2,2], 'c': [1,2,2]}
df = pd.DataFrame(data)
# Import the file through xlsxwriter
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Define the formats
format_orange = workbook.add_format({'bg_color': 'orange'})
format_green = workbook.add_format({'bg_color': 'green'})
format_bold = workbook.add_format({'bold': True, 'align': 'center'})
# Start iterating through the columns and the rows to apply the format
for row in range(df.shape[0]):
worksheet.write(row+1, 0, df.iloc[row,0], format_orange)
# Alternative syntax
#for row in range(df.shape[0]):
# worksheet.write(f'A{row+2}', df.iloc[row,0], format_orange)
for row in range(df.shape[0]):
worksheet.write(row+1, 1, df.iloc[row,1], format_green)
# Here you can use the faster set_column function as you do not apply color
worksheet.set_column('C:C', 15, format_bold)
# Finally write the file
writer.save()
Output:

Writing to a specific range/Column with Pandas

I'm attempting to copy from column Range AP:AR of workbook 1 to Range A:C of workbook 2 through Pandas data frames.
I have successfully read the data frame below in workbook 1, I then want to write this into workbook 2 of the specified range. So AP:AR to AQ:AS.
I have tried:
#df.to_excel(writer, 'AP')
I have also tried the following:
#df = pd.write_excel(filename, skiprows = 2, parse_cols = 'AP:AR')
pd.writer = pd.ExcelWriter('output.xlsx', columns = 'AP:AR')
pd.writer.save()
For example:
filename ='C:/ workbook 1.xlsx'
df = pd.read_excel(filename, skiprows = 2, parse_cols = 'A:C')
import pandas as pd
writer = pd.ExcelWriter('C:/DRAX/ workbook 2.xlsx')
df.to_excel(writer, 'AQ')
writer.save()
print(df)
It reads correctly, but writes to Cell column ‘B’ instead of AQ.
You have to specify the starting column you want to write the dataframe with the parameter startcol, which is an integer starting from 0:
So you should change the line
df.to_excel(writer, 'AQ')
to
df.to_excel(writer, startcol=42) # AQ has the index of 42
Results:

Categories