Related
I am trying to create a script using python and openpyxl to open up a given excel sheet and merge all cells in a given row together until the script finds a cell containing a string. The row placement is always the same, but the number of columns and the column placement of the strings is not so it needs to be dynamic. Once a new string is found, I want to continue to merge cells until the column that is right before the grand total. There are also cases where the cell doesn't need to be merged, because there is no empty cell in the data set to merge it with.
I found this answer here, which is doing a similar procedure except it is merging rows instead of columns. I was able to refactor part of this to create a list of the cells that have strings in my workbook, but am struggling on next steps. Any thoughts?
import openpyxl
from openpyxl.utils import get_column_letter
from openpyxl import Workbook
wb1 = openpyxl.load_workbook('stackoverflow question.xlsx')
ws1 = wb1.worksheets['ws1']
columns_with_strings = []
merge_row = '3' #the data to merge will always be in this row
for col in range (2, ws1.max_column-1):
for row in merge_row:
if ws1[get_column_letter(col) + merge_row].value != None:
columns_with_strings.append(str(get_column_letter(col) + merge_row)
The above code yields this list which includes the correct cells that contain strings and need to be checked for merging:
['C3', 'F3', 'J3']
This is how the workbook looks now:
This is how I am trying to get it to look in the end:
To complete your code, you can use worksheet.merge_cells with worhseet.cell.alignment:
from openpyxl import load_workbook
from openpyxl.styles import Alignment
wb = load_workbook("tmp/stackoverflow question.xlsx")
ws = wb["Sheet1"]
merge_row = 3
#here, we get the columns idx for every non null cell in row 3
#and after that, we make a text alignment (center) in the last cell
idx_col_strings = [cell.column for cell in ws[merge_row] if cell.value]
ws.cell(3, idx_col_strings[-1]).alignment = Alignment(horizontal="center")
#here, we loop through each range until the last non null cell in row 3
#then, we make a merge as much as the number of transitions (non null => null)
#and finally, we make a text alignement (center) for each cell/merge
for i in range(len(idx_col_strings)-1):
start_col, end_col = idx_col_strings[i], idx_col_strings[i+1]-1
ws.merge_cells(start_row=merge_row, start_column=start_col,
end_row=merge_row, end_column=end_col)
ws.cell(merge_row, start_col).alignment = Alignment(horizontal="center")
wb.save("tmp/stackoverflow answer.xlsx")
BEFORE :
AFTER :
To start, if you aren't familiar with openpyxl's merge and unmerge functions, I recommend your read about them in the documentation (https://openpyxl.readthedocs.io/en/stable/usage.html#merge-unmerge-cells) to get a sense of how this works.
Here is base code that should provide the functionality you are wanting, but some values may need tweaked for your device or spreadsheet.
import openpyxl # Necessary imports.
from openpyxl.utils import get_column_letter
from openpyxl.utils.cell import coordinate_from_string
from openpyxl.utils.cell import column_index_from_string
from openpyxl import Workbook
wb1 = openpyxl.load_workbook('stackoverflow question.xlsx') # Start of your code.
ws1 = wb1.worksheets[0]
columns_with_strings = []
merge_row = '3' #the data to merge will always be in this row
for col in range (2, ws1.max_column):
for row in merge_row:
if ws1[get_column_letter(col) + merge_row].value != None:
columns_with_strings.append(str(get_column_letter(col) + merge_row)) # End of your code.
prior_string = columns_with_strings[0] # Set the "prior_string" to be the first detected string.
for cell in columns_with_strings:
coords = coordinate_from_string(cell) # Split "prior_string" into the letter and number components.
if column_index_from_string(coords[0]) >1:
prior = str(get_column_letter(column_index_from_string(coords[0])-1)) + str(coords[1]) # Get the cell that is left of the cell "prior_string"
if prior > prior_string:
ws1.merge_cells(f'{prior_string}:{prior}') # Merge the cells.
prior_string=cell # Set the current string to be the prior string.
ws1.merge_cells(f'{cell}:{get_column_letter(ws1.max_column)+str(coords[1])}') # Merge the last string to the end (the last column).
wb1.save("stackoverflow question.xlsx") # Save the file changes.
I hope this helps to point you in the right direction!
Based on #timeless' answer I've cleaned the code up a bit to make better use of Python's tools and the openpyxl API
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
ws.append([])
ws.append([])
ws.append([None, None, "Group A", None, None, "Group B", None, None, None, "Group C"])
# get column indices for header cells
headings = [cell.column for cell in next(ws.iter_rows(min_row=3, max_row=3)) if cell.value]
from openpyxl.styles import Alignment, PatternFill, NamedStyle
fill = PatternFill(patternType="solid", fgColor="DDDDDD")
alignment = Alignment(horizontal="center")
header_style = NamedStyle(alignment=alignment, fill=fill, name="Header")
wb.named_styles.append(header_style)
from itertools import zip_longest
# create ranges for merged cells from the list of header cells: the boundary of the first range, is the index of the start of the next minus 1. Use zip_longest for the final value
for start_column, end_column in zip_longest(headings, headings[1:], fillvalue=headings[-1]+1):
ws.cell(3, start_column).style = header_style
ws.merge_cells(start_row=3, end_row=3, start_column=start_column, end_column=end_column-1)
wb.save("merged.xlsx")
Using the API wherever possible generally leads to more manageable and generic code.
I tried many functions and tried to apply existing solutions to get the output I want, yet I seems not to be able to get an excel output at the end that keeps the formatting I try to apply.
It seems that all the function existing in pandas uses only identically labelled indexes, or files of the same shape, in my situation the shape of the two files are (757,26) for let's say file1 and (688,39) for file 2, the first 26 columns are labelled the same way for file1 and file2.
is there a way to merge these two files, highlight the differences as indicated in the title, and create an excel output with the formatting still present?
Here is what I tried:
import pandas as pd
import numpy as np
import openpyxl
from openpyxl.utils.dataframe import dataframe_to_rows
from openpyxl import Workbook
import pandas.io.formats.style as style
dfcurr=pd.read_excel(r'IPH4201P2 - DRATracker_Current.xlsx')
dfprev=pd.read_excel(r'IPH4201P2 - DRATracker_Previous.xlsx')
dfprev=dfprev.loc[(dfprev['Subject'].str.contains('M'))|(dfprev['Subject'].str.contains('S'))]
dfprev=dfprev.reset_index()
dfprev=dfprev.drop(columns='index')
df_diff=pd.merge(dfcurr,dfprev,how='left',indicator=True)
common_columns = df_diff.columns.intersection(dfprev.columns)
compare_df = df_diff[common_columns].eq(dfprev[common_columns])
compare_df.to_excel('comp.xlsx')
# Convert dataframe to string
df_diff = df_diff.astype(str)
def highlight_diff(data, compare):
if type(data) != pd.DataFrame:
data = pd.DataFrame(data)
if type(compare) != pd.DataFrame:
compare = pd.DataFrame(compare)
result = []
for col in data.columns:
if col in compare.columns and (data[col] != compare[col]).any():
result.append('background-color: #DAEEF3')
elif col not in compare.columns:
result.append('background-color: #E4DFEC')
else:
result.append('background-color: white')
return result
# Create a new workbook and add a new worksheet
wb = Workbook()
ws = wb.active
# Write the dataframe to the worksheet
for r in dataframe_to_rows(df_diff.style.apply(highlight_diff, compare=compare_df).data, index=False, header=True):
ws.append(r)
# Save the workbook
wb.save('Merged_style.xlsx')
However, I do not get an output with the style applied; no cells are highlighted in the color I want them to be highlighted in.
Edit:
I tried a different approach to highlight the cells in the excel, the function used for this approach comes from here:
import pandas as pd
import numpy as np
import openpyxl
import pandas.io.formats.style as style
dfcurr=pd.read_excel(r'IPH4201P2 - DRATracker_Current.xlsx')
dfprev=pd.read_excel(r'IPH4201P2 - DRATracker_Previous.xlsx')
dfprev=dfprev.loc[(dfprev['Subject'].str.contains('M'))|(dfprev['Subject'].str.contains('S'))]
dfprev=dfprev.reset_index()
dfprev=dfprev.drop(columns='index')
new='background-color: #DAEEF3'
change='background-color: #E4DFEC'
df_diff=pd.merge(dfcurr,dfprev,on=['Subject','Visit','Visit Date','Site\nID','Cohort','Pathology','Clinical\nStage At\nScreening','TNMBA at\nScreening'],how='left',indicator=True)
for col in df_diff.columns:
if '_y' in col:
del df_diff[col]
elif 'Unnamed: 1' in col:
del df_diff[col]
elif '_x' in col:
df_diff.columns=df_diff.columns.str.rstrip('_x')
def highlight_diff(data, other, color='#DAEEF3'):
# Define html attribute
attr = 'background-color: {}'.format(color)
# Where data != other set attribute
return pd.DataFrame(np.where(data.ne(other), attr, ''),
index=data.index, columns=data.columns)
# Set axis=None so it passes the entire frame
df_diff=df_diff.style.apply(highlight_diff, axis=None, other=dfprev)
print(type(df_diff))
df_diff.to_excel('Diff.xlsx',engine='openpyxl',index=0)
This new method provides me with an excel file where the style is applied, how can I update it to apply the color #DAEEF3 to rows in df_diff where if the Subject, Visit and Visit Date are not present in the dataframe dfprev, and apply the color #E4DFEC to cells that differs between the two files for matching Subject, Visit and Visit Date?
This code is isn't doing anything...
df_diff.style.apply(highlight_diff, compare=compare_df).data
df_diff.style creates a Styler object.
.apply applies a function to that Styler for it to attach relevant HTML styling which it stores as a mapping in the Styler context.
.data the just retrieves the original DataFrame object that you created the Styler object with and it has nothing to do with those HTML styling contexts you created for the Styler, so you are effectively discarding them with this final .data addition.
Styler has its own to_excel method which interprets some of that HTML styling context and converts it to Excel cell coloring and formatting.
After asking around to people I know that had to do something similar, here is the final code that produces the expected output:
import pandas as pd
from openpyxl.styles import PatternFill
from openpyxl import load_workbook
#DATA FILES##################################################
#Set below to False to copy comments manually and keep the comment formatting.
copy_comments_automatically = True
#Update folderPath
folderPath = "C:/Users/G.Tielemans/OneDrive - Medpace/Documents/Innate/Script/DRA/"
#File names must match and files must be closed when running
dfcurr = pd.read_excel(folderPath + "IPH4201P2 - DRATracker_Current.xlsx")
dfprev = pd.read_excel(folderPath + "IPH4201P2 - DRATracker_Previous.xlsx")
#############################################################
#LOADING DATA################################################
dfprev = dfprev.loc[(dfprev['Subject'].str.contains('M'))|(dfprev['Subject'].str.contains('S'))]
dfprev = dfprev.reset_index()
dfprev = dfprev.drop(columns='index')
dfprevComments = dfprev.iloc[:, 29:]
#############################################################
#NEW LINES###################################################
def highlightNewLines(linecurr):
currSubject = linecurr["Subject"]
currVisit = linecurr["Visit"]
currVisitDate = linecurr["Visit Date"]
for index, row in dfprev.iterrows():
if currSubject == row["Subject"] and currVisit == row["Visit"] and currVisitDate == row["Visit Date"]:
return True
return False
dfcurr["Duplicate?"] = dfcurr.apply(lambda row: highlightNewLines(row), axis = 1)
#############################################################
#FIND UPDATES################################################
dfDupes = dfcurr[dfcurr["Duplicate?"] == True]
dfDupeslen = len(dfDupes)
#indexes of new lines to paste at bottom of file and color
indexes = dfcurr[dfcurr["Duplicate?"] == False].index
dfDupes = dfDupes.drop("Duplicate?", axis = 1)
dfDupes = dfDupes.reset_index(drop=True)
dfprev = dfprev.iloc[:,0:26]
dfprev = dfprev.reset_index(drop=True)
difference = dfDupes[dfDupes!=dfprev]
#############################################################
#ATTACH NEW FINDINGS AND PASTE MEDPACE COMMENT COLUMNS#######
newfindings = dfcurr.loc[indexes]
newfindings = newfindings.drop("Duplicate?", axis = 1)
dfDupes = pd.concat([dfDupes, newfindings])
dfDupes = dfDupes.reset_index(drop=True)
dflen = len(dfDupes)
if copy_comments_automatically:
dfDupes = pd.concat([dfDupes, dfprevComments], axis=1)
#############################################################
#COLORING####################################################
dfDupes.to_excel(folderPath + "IPH4201P2 - DRATracker_Output.xlsx", index = False)
wb = load_workbook(folderPath + "IPH4201P2 - DRATracker_Output.xlsx")
ws = wb.active
fillred = PatternFill(start_color="ffc7ce", end_color="ffc7ce", fill_type = "solid")
fillblue = PatternFill(start_color="99ccff", end_color="99ccff", fill_type = "solid")
for row in range(len(difference)):
for column in range(len(difference.columns)):
if pd.isnull(difference.iloc[row, column]) == False:
ws.cell(row+2, column+1).fill = fillred
for row in range(dfDupeslen, dflen):
for column in [2,5,6]:
ws.cell(row+2, column).fill = fillblue
wb.save(folderPath + "IPH4201P2 - DRATracker_Output.xlsx")
#############################################################
print("Done")
I am being asked to generate some Excel reports. I am currently using pandas quite heavily for my data, so naturally I would like to use the pandas.ExcelWriter method to generate these reports. However the fixed column widths are a problem.
The code I have so far is simple enough. Say I have a dataframe called df:
writer = pd.ExcelWriter(excel_file_path, engine='openpyxl')
df.to_excel(writer, sheet_name="Summary")
I was looking over the pandas docs, and I don't really see any options to set column widths. Is there a trick to make it such that the columns auto-adjust to the data? Or is there something I can do after the fact to the xlsx file to adjust the column widths?
(I am using the OpenPyXL library, and generating .xlsx files - if that makes any difference.)
Inspired by user6178746's answer, I have the following:
# Given a dict of dataframes, for example:
# dfs = {'gadgets': df_gadgets, 'widgets': df_widgets}
writer = pd.ExcelWriter(filename, engine='xlsxwriter')
for sheetname, df in dfs.items(): # loop through `dict` of dataframes
df.to_excel(writer, sheet_name=sheetname) # send df to writer
worksheet = writer.sheets[sheetname] # pull worksheet object
for idx, col in enumerate(df): # loop through all columns
series = df[col]
max_len = max((
series.astype(str).map(len).max(), # len of largest item
len(str(series.name)) # len of column name/header
)) + 1 # adding a little extra space
worksheet.set_column(idx, idx, max_len) # set column width
writer.save()
Dynamically adjust all the column lengths
writer = pd.ExcelWriter('/path/to/output/file.xlsx')
df.to_excel(writer, sheet_name='sheetName', index=False, na_rep='NaN')
for column in df:
column_length = max(df[column].astype(str).map(len).max(), len(column))
col_idx = df.columns.get_loc(column)
writer.sheets['sheetName'].set_column(col_idx, col_idx, column_length)
writer.save()
Manually adjust a column using Column Name
col_idx = df.columns.get_loc('columnName')
writer.sheets['sheetName'].set_column(col_idx, col_idx, 15)
Manually adjust a column using Column Index
writer.sheets['sheetName'].set_column(col_idx, col_idx, 15)
In case any of the above is failing with
AttributeError: 'Worksheet' object has no attribute 'set_column'
make sure to install xlsxwriter:
pip install xlsxwriter
For a more comprehensive explanation you can read the article How to Auto-Adjust the Width of Excel Columns with Pandas ExcelWriter on TDS.
I'm posting this because I just ran into the same issue and found that the official documentation for Xlsxwriter and pandas still have this functionality listed as unsupported. I hacked together a solution that solved the issue i was having. I basically just iterate through each column and use worksheet.set_column to set the column width == the max length of the contents of that column.
One important note, however. This solution does not fit the column headers, simply the column values. That should be an easy change though if you need to fit the headers instead. Hope this helps someone :)
import pandas as pd
import sqlalchemy as sa
import urllib
read_server = 'serverName'
read_database = 'databaseName'
read_params = urllib.quote_plus("DRIVER={SQL Server};SERVER="+read_server+";DATABASE="+read_database+";TRUSTED_CONNECTION=Yes")
read_engine = sa.create_engine("mssql+pyodbc:///?odbc_connect=%s" % read_params)
#Output some SQL Server data into a dataframe
my_sql_query = """ SELECT * FROM dbo.my_table """
my_dataframe = pd.read_sql_query(my_sql_query,con=read_engine)
#Set destination directory to save excel.
xlsFilepath = r'H:\my_project' + "\\" + 'my_file_name.xlsx'
writer = pd.ExcelWriter(xlsFilepath, engine='xlsxwriter')
#Write excel to file using pandas to_excel
my_dataframe.to_excel(writer, startrow = 1, sheet_name='Sheet1', index=False)
#Indicate workbook and worksheet for formatting
workbook = writer.book
worksheet = writer.sheets['Sheet1']
#Iterate through each column and set the width == the max length in that column. A padding length of 2 is also added.
for i, col in enumerate(my_dataframe.columns):
# find length of column i
column_len = my_dataframe[col].astype(str).str.len().max()
# Setting the length if the column header is larger
# than the max column value length
column_len = max(column_len, len(col)) + 2
# set the column length
worksheet.set_column(i, i, column_len)
writer.save()
There is a nice package that I started to use recently called StyleFrame.
it gets DataFrame and lets you to style it very easily...
by default the columns width is auto-adjusting.
for example:
from StyleFrame import StyleFrame
import pandas as pd
df = pd.DataFrame({'aaaaaaaaaaa': [1, 2, 3],
'bbbbbbbbb': [1, 1, 1],
'ccccccccccc': [2, 3, 4]})
excel_writer = StyleFrame.ExcelWriter('example.xlsx')
sf = StyleFrame(df)
sf.to_excel(excel_writer=excel_writer, row_to_add_filters=0,
columns_and_rows_to_freeze='B2')
excel_writer.save()
you can also change the columns width:
sf.set_column_width(columns=['aaaaaaaaaaa', 'bbbbbbbbb'],
width=35.3)
UPDATE 1
In version 1.4 best_fit argument was added to StyleFrame.to_excel.
See the documentation.
UPDATE 2
Here's a sample of code that works for StyleFrame 3.x.x
from styleframe import StyleFrame
import pandas as pd
columns = ['aaaaaaaaaaa', 'bbbbbbbbb', 'ccccccccccc', ]
df = pd.DataFrame(data={
'aaaaaaaaaaa': [1, 2, 3, ],
'bbbbbbbbb': [1, 1, 1, ],
'ccccccccccc': [2, 3, 4, ],
}, columns=columns,
)
excel_writer = StyleFrame.ExcelWriter('example.xlsx')
sf = StyleFrame(df)
sf.to_excel(
excel_writer=excel_writer,
best_fit=columns,
columns_and_rows_to_freeze='B2',
row_to_add_filters=0,
)
excel_writer.save()
There is probably no automatic way to do it right now, but as you use openpyxl, the following line (adapted from another answer by user Bufke on how to do in manually) allows you to specify a sane value (in character widths):
writer.sheets['Summary'].column_dimensions['A'].width = 15
By using pandas and xlsxwriter you can do your task, below code will perfectly work in Python 3.x. For more details on working with XlsxWriter with pandas this link might be useful https://xlsxwriter.readthedocs.io/working_with_pandas.html
import pandas as pd
writer = pd.ExcelWriter(excel_file_path, engine='xlsxwriter')
df.to_excel(writer, sheet_name="Summary")
workbook = writer.book
worksheet = writer.sheets["Summary"]
#set the column width as per your requirement
worksheet.set_column('A:A', 25)
writer.save()
I found that it was more useful to adjust the column with based on the column header rather than column content.
Using df.columns.values.tolist() I generate a list of the column headers and use the lengths of these headers to determine the width of the columns.
See full code below:
import pandas as pd
import xlsxwriter
writer = pd.ExcelWriter(filename, engine='xlsxwriter')
df.to_excel(writer, index=False, sheet_name=sheetname)
workbook = writer.book # Access the workbook
worksheet= writer.sheets[sheetname] # Access the Worksheet
header_list = df.columns.values.tolist() # Generate list of headers
for i in range(0, len(header_list)):
worksheet.set_column(i, i, len(header_list[i])) # Set column widths based on len(header)
writer.save() # Save the excel file
At work, I am always writing the dataframes to excel files. So instead of writing the same code over and over, I have created a modulus. Now I just import it and use it to write and formate the excel files. There is one downside though, it takes a long time if the dataframe is extra large.
So here is the code:
def result_to_excel(output_name, dataframes_list, sheet_names_list, output_dir):
out_path = os.path.join(output_dir, output_name)
writerReport = pd.ExcelWriter(out_path, engine='xlsxwriter',
datetime_format='yyyymmdd', date_format='yyyymmdd')
workbook = writerReport.book
# loop through the list of dataframes to save every dataframe into a new sheet in the excel file
for i, dataframe in enumerate(dataframes_list):
sheet_name = sheet_names_list[i] # choose the sheet name from sheet_names_list
dataframe.to_excel(writerReport, sheet_name=sheet_name, index=False, startrow=0)
# Add a header format.
format = workbook.add_format({
'bold': True,
'border': 1,
'fg_color': '#0000FF',
'font_color': 'white'})
# Write the column headers with the defined format.
worksheet = writerReport.sheets[sheet_name]
for col_num, col_name in enumerate(dataframe.columns.values):
worksheet.write(0, col_num, col_name, format)
worksheet.autofilter(0, 0, 0, len(dataframe.columns) - 1)
worksheet.freeze_panes(1, 0)
# loop through the columns in the dataframe to get the width of the column
for j, col in enumerate(dataframe.columns):
max_width = max([len(str(s)) for s in dataframe[col].values] + [len(col) + 2])
# define a max width to not get to wide column
if max_width > 50:
max_width = 50
worksheet.set_column(j, j, max_width)
writerReport.save()
return output_dir + output_name
Combining the other answers and comments and also supporting multi-indices:
def autosize_excel_columns(worksheet, df):
autosize_excel_columns_df(worksheet, df.index.to_frame())
autosize_excel_columns_df(worksheet, df, offset=df.index.nlevels)
def autosize_excel_columns_df(worksheet, df, offset=0):
for idx, col in enumerate(df):
series = df[col]
max_len = max((
series.astype(str).map(len).max(),
len(str(series.name))
)) + 1
worksheet.set_column(idx+offset, idx+offset, max_len)
sheetname=...
df.to_excel(writer, sheet_name=sheetname, freeze_panes=(df.columns.nlevels, df.index.nlevels))
worksheet = writer.sheets[sheetname]
autosize_excel_columns(worksheet, df)
writer.save()
you can solve the problem by calling the following function, where df is the dataframe you want to get the sizes and the sheetname is the sheet in excel where you want the modifications to take place
def auto_width_columns(df, sheetname):
workbook = writer.book
worksheet= writer.sheets[sheetname]
for i, col in enumerate(df.columns):
column_len = max(df[col].astype(str).str.len().max(), len(col) + 2)
worksheet.set_column(i, i, column_len)
import re
import openpyxl
..
for col in _ws.columns:
max_lenght = 0
print(col[0])
col_name = re.findall('\w\d', str(col[0]))
col_name = col_name[0]
col_name = re.findall('\w', str(col_name))[0]
print(col_name)
for cell in col:
try:
if len(str(cell.value)) > max_lenght:
max_lenght = len(cell.value)
except:
pass
adjusted_width = (max_lenght+2)
_ws.column_dimensions[col_name].width = adjusted_width
Yes, there is there is something you can do subsequently to the xlsx file to adjust the column widths.
Use xlwings to autofit columns. It's a pretty simple solution, see the 6 last lines of the example code. The advantage of this procedure is that you don't have to worry about font size, font type or anything else.
Requirement: Excel installation.
import pandas as pd
import xlwings as xw
path = r"test.xlsx"
# Export your dataframe in question.
df = pd._testing.makeDataFrame()
df.to_excel(path)
# Autofit all columns with xlwings.
with xw.App(visible=False) as app:
wb = xw.Book(path)
for ws in wb.sheets:
ws.autofit(axis="columns")
wb.save(path)
wb.close()
Easiest solution is to specify width of column in set_column method.
for worksheet in writer.sheets.values():
worksheet.set_column(0,last_column_value, required_width_constant)
This function works for me, also fixes the index width
def write_to_excel(writer, X, sheet_name, sep_only=False):
#writer=writer object
#X=dataframe
#sheet_name=name of sheet
#sep_only=True:write only as separate excel file, False: write as sheet to the writer object
if sheet_name=="":
print("specify sheet_name!")
else:
X.to_excel(f"{output_folder}{prefix_excel_save}_{sheet_name}.xlsx")
if not sep_only:
X.to_excel(writer, sheet_name=sheet_name)
#fix column widths
worksheet = writer.sheets[sheet_name] # pull worksheet object
for idx, col in enumerate(X.columns): # loop through all columns
series = X[col]
max_len = max((
series.astype(str).map(len).max(), # len of largest item
len(str(series.name)) # len of column name/header
)) + 1 # adding a little extra space
worksheet.set_column(idx+1, idx+1, max_len) # set column width (=1 because index = 1)
#fix index width
max_len=pd.Series(X.index.values).astype(str).map(len).max()+1
worksheet.set_column(0, 0, max_len)
if sep_only:
print(f'{sheet_name} is written as seperate file')
else:
print(f'{sheet_name} is written as seperate file')
print(f'{sheet_name} is written as sheet')
return writer
call example:
writer = write_to_excel(writer, dataframe, "Statistical_Analysis")
I may be a bit late to the party but this code works when using 'openpyxl' as your engine, sometimes pip install xlsxwriter wont solve the issue. This code below works like a charm. Edit any part as you wish.
def text_length(text):
"""
Get the effective text length in characters, taking into account newlines
"""
if not text:
return 0
lines = text.split("\n")
return max(len(line) for line in lines)
def _to_str_for_length(v, decimals=3):
"""
Like str() but rounds decimals to predefined length
"""
if isinstance(v, float):
# Round to [decimal] places
return str(Decimal(v).quantize(Decimal('1.' + '0' * decimals)).normalize())
else:
return str(v)
def auto_adjust_xlsx_column_width(df, writer, sheet_name, margin=3, length_factor=1.0, decimals=3, index=False):
sheet = writer.sheets[sheet_name]
_to_str = functools.partial(_to_str_for_length, decimals=decimals)
# Compute & set column width for each column
for column_name in df.columns:
# Convert the value of the columns to string and select the
column_length = max(df[column_name].apply(_to_str).map(text_length).max(), text_length(column_name)) + 5
# Get index of column in XLSX
# Column index is +1 if we also export the index column
col_idx = df.columns.get_loc(column_name)
if index:
col_idx += 1
# Set width of column to (column_length + margin)
sheet.column_dimensions[openpyxl.utils.cell.get_column_letter(col_idx + 1)].width = column_length * length_factor + margin
# Compute column width of index column (if enabled)
if index: # If the index column is being exported
index_length = max(df.index.map(_to_str).map(text_length).max(), text_length(df.index.name))
sheet.column_dimensions["A"].width = index_length * length_factor + margin
An openpyxl version based on #alichaudry's code.
The code 1) loads an excel file, 2) adjusts column widths and 3) saves it.
def auto_adjust_column_widths(excel_file : "Excel File Path", extra_space = 1) -> None:
"""
Adjusts column widths of the excel file and replaces it with the adjusted one.
Adjusting columns is based on the lengths of columns values (including column names).
Parameters
----------
excel_file :
excel_file to adjust column widths.
extra_space :
extra column width in addition to the value-based-widths
"""
from openpyxl import load_workbook
from openpyxl.utils import get_column_letter
wb = load_workbook(excel_file)
for ws in wb:
df = pd.DataFrame(ws.values,)
for i,r in (df.astype(str).applymap(len).max(axis=0) + extra_space).iteritems():
ws.column_dimensions[get_column_letter(i+1)].width = r
wb.save(excel_file)
I have following script which is converting a CSV file to an XLSX file, but my column size is very narrow. Each time I have to drag them with mouse to read data. Does anybody know how to set column width in openpyxl?
Here is the code I am using.
#!/usr/bin/python2.6
import csv
from openpyxl import Workbook
from openpyxl.cell import get_column_letter
f = open('users_info_cvs.txt', "rU")
csv.register_dialect('colons', delimiter=':')
reader = csv.reader(f, dialect='colons')
wb = Workbook()
dest_filename = r"account_info.xlsx"
ws = wb.worksheets[0]
ws.title = "Users Account Information"
for row_index, row in enumerate(reader):
for column_index, cell in enumerate(row):
column_letter = get_column_letter((column_index + 1))
ws.cell('%s%s'%(column_letter, (row_index + 1))).value = cell
wb.save(filename = dest_filename)
You could estimate (or use a mono width font) to achieve this. Let's assume data is a nested array like
[['a1','a2'],['b1','b2']]
We can get the max characters in each column. Then set the width to that. Width is exactly the width of a monospace font (if not changing other styles at least). Even if you use a variable width font it is a decent estimation. This will not work with formulas.
from openpyxl.utils import get_column_letter
column_widths = []
for row in data:
for i, cell in enumerate(row):
if len(column_widths) > i:
if len(cell) > column_widths[i]:
column_widths[i] = len(cell)
else:
column_widths += [len(cell)]
for i, column_width in enumerate(column_widths,1): # ,1 to start at 1
worksheet.column_dimensions[get_column_letter(i)].width = column_width
A bit of a hack but your reports will be more readable.
My variation of Bufke's answer. Avoids a bit of branching with the array and ignores empty cells / columns.
Now fixed for non-string cell values.
ws = your current worksheet
dims = {}
for row in ws.rows:
for cell in row:
if cell.value:
dims[cell.column] = max((dims.get(cell.column, 0), len(str(cell.value))))
for col, value in dims.items():
ws.column_dimensions[col].width = value
As of openpyxl version 3.0.3 you need to use
dims[cell.column_letter] = max((dims.get(cell.column_letter, 0), len(str(cell.value))))
as the openpyxl library will raise a TypeError if you pass column_dimensions a number instead of a column letter, everything else can stay the same.
Even more pythonic way to set the width of all columns that works at least in openpyxl version 2.4.0:
for column_cells in worksheet.columns:
length = max(len(as_text(cell.value)) for cell in column_cells)
worksheet.column_dimensions[column_cells[0].column].width = length
The as_text function should be something that converts the value to a proper length string, like for Python 3:
def as_text(value):
if value is None:
return ""
return str(value)
With openpyxl 3.0.3 the best way to modify the columns is with the DimensionHolder object, which is a dictionary that maps each column to a ColumnDimension object.
ColumnDimension can get parameters as bestFit, auto_size (which is an alias of bestFit) and width.
Personally, auto_size doesn't work as expected and I had to use width and figured out that the best width for the column is len(cell_value) * 1.23.
To get the value of each cell it's necessary to iterate over each one, but I personally didn't use it because in my project I just had to write worksheets, so I got the longest string in each column directly on my data.
The example below just shows how to modify the column dimensions:
import openpyxl
from openpyxl.worksheet.dimensions import ColumnDimension, DimensionHolder
from openpyxl.utils import get_column_letter
wb = openpyxl.load_workbook("Example.xslx")
ws = wb["Sheet1"]
dim_holder = DimensionHolder(worksheet=ws)
for col in range(ws.min_column, ws.max_column + 1):
dim_holder[get_column_letter(col)] = ColumnDimension(ws, min=col, max=col, width=20)
ws.column_dimensions = dim_holder
I have a problem with merged_cells and autosize not work correctly, if you have the same problem, you can solve with the next code:
for col in worksheet.columns:
max_length = 0
column = col[0].column # Get the column name
for cell in col:
if cell.coordinate in worksheet.merged_cells: # not check merge_cells
continue
try: # Necessary to avoid error on empty cells
if len(str(cell.value)) > max_length:
max_length = len(cell.value)
except:
pass
adjusted_width = (max_length + 2) * 1.2
worksheet.column_dimensions[column].width = adjusted_width
A slight improvement of the above accepted answer, that I think is more pythonic (asking for forgiveness is better than asking for permission)
column_widths = []
for row in workSheet.iter_rows():
for i, cell in enumerate(row):
try:
column_widths[i] = max(column_widths[i], len(str(cell.value)))
except IndexError:
column_widths.append(len(str(cell.value)))
for i, column_width in enumerate(column_widths):
workSheet.column_dimensions[get_column_letter(i + 1)].width = column_width
We can convert numbers to their ASCII values and give it to column_dimension parameter
import openpyxl as xl
work_book = xl.load_workbook('file_location')
sheet = work_book['Sheet1']
column_number = 2
column = str(chr(64 + column_number))
sheet.column_dimensions[column].width = 20
work_book.save('file_location')
Here is a more general, simplified solution for users new to the topic (Not specified for the question).
If you want to change the width or the height of cells in openpyxl (Version 3.0.9), you can do it simply by assigning the attributes of the cells with row_dimensions or column_dimensions.
import openpyxl
wb = openpyxl.Workbook()
sheet = wb["Sheet"]
sheet["A1"] = "Tall row"
sheet["B2"] = "Wide column"
# Change height of row A1
sheet.row_dimensions[1].height = 100
# Change width of column B
sheet.column_dimensions["B"].width = 50
wb.save("StackOverflow.xlsx")
This is my version referring #Virako 's code snippet
def adjust_column_width_from_col(ws, min_row, min_col, max_col):
column_widths = []
for i, col in \
enumerate(
ws.iter_cols(min_col=min_col, max_col=max_col, min_row=min_row)
):
for cell in col:
value = cell.value
if value is not None:
if isinstance(value, str) is False:
value = str(value)
try:
column_widths[i] = max(column_widths[i], len(value))
except IndexError:
column_widths.append(len(value))
for i, width in enumerate(column_widths):
col_name = get_column_letter(min_col + i)
value = column_widths[i] + 2
ws.column_dimensions[col_name].width = value
And how to use is as follows,
adjust_column_width_from_col(ws, 1,1, ws.max_column)
All the above answers are generating an issue which is that col[0].column is returning number while worksheet.column_dimensions[column] accepts only character such as 'A', 'B', 'C' in place of column. I've modified #Virako's code and it is working fine now.
import re
import openpyxl
..
for col in _ws.columns:
max_lenght = 0
print(col[0])
col_name = re.findall('\w\d', str(col[0]))
col_name = col_name[0]
col_name = re.findall('\w', str(col_name))[0]
print(col_name)
for cell in col:
try:
if len(str(cell.value)) > max_lenght:
max_lenght = len(cell.value)
except:
pass
adjusted_width = (max_lenght+2)
_ws.column_dimensions[col_name].width = adjusted_width
This is a dirty fix. But openpyxl actually supports auto_fit. But there is no method to access the property.
import openpyxl
from openpyxl.utils import get_column_letter
wb = openpyxl.load_workbook("Example.xslx")
ws = wb["Sheet1"]
for i in range(1, ws.max_column+1):
ws.column_dimensions[get_column_letter(i)].bestFit = True
ws.column_dimensions[get_column_letter(i)].auto_size = True
Another approach without storing any state could be like this:
from itertools import chain
# Using `ws` as the Worksheet
for cell in chain.from_iterable(ws.iter_cols()):
if cell.value:
ws.column_dimensions[cell.column_letter].width = max(
ws.column_dimensions[cell.column_letter].width,
len(f"{cell.value}"),
)
I had to change #User3759685 above answer to this when the openpxyl updated. I was getting an error. Well #phihag reported this in the comments as well
for column_cells in ws.columns:
new_column_length = max(len(as_text(cell.value)) for cell in column_cells)
new_column_letter = (openpyxl.utils.get_column_letter(column_cells[0].column))
if new_column_length > 0:
ws.column_dimensions[new_column_letter].width = new_column_length + 1
Compiling and applying multiple suggestions above, and extending merged cells detection to the horizontally merged cells only, I could offer this code:
def adjust_width(ws):
"""
Adjust width of the columns
#param ws: worksheet
#return: None
"""
def is_merged_horizontally(cell):
"""
Checks if cell is merged horizontally with an another cell
#param cell: cell to check
#return: True if cell is merged horizontally with an another cell, else False
"""
cell_coor = cell.coordinate
if cell_coor not in ws.merged_cells:
return False
for rng in ws.merged_cells.ranges:
if cell_coor in rng and len(list(rng.cols)) > 1:
return True
return False
for col_number, col in enumerate(ws.columns, start=1):
col_letter = get_column_letter(col_number)
max_length = max(
len(str(cell.value or "")) for cell in col if not is_merged_horizontally(cell)
)
adjusted_width = (max_length + 2) * 0.95
ws.column_dimensions[col_letter].width = adjusted_width
After update from openpyxl2.5.2a to latest 2.6.4 (final version for python 2.x support), I got same issue in configuring the width of a column.
Basically I always calculate the width for a column (dims is a dict maintaining each column width):
dims[cell.column] = max((dims.get(cell.column, 0), len(str(cell.value))))
Afterwards I am modifying the scale to something shortly bigger than original size, but now you have to give the "Letter" value of a column and not anymore a int value (col below is the value and is translated to the right letter):
worksheet.column_dimensions[get_column_letter(col)].width = value +1
This will fix the visible error and assigning the right width to your column ;)
Hope this help.
I made a function that is very fast with large Excel files because it uses pandas.read_excel
import pandas as pd
from openpyxl import load_workbook
from openpyxl.utils import get_column_letter
def auto_adjust_column_width(file_path, sheet_name=0):
column_widths = []
df = pd.read_excel(file_path, sheet_name=sheet_name, header=None)
for col in df.columns:
max_length = int(df[col].astype(str).str.len().max() * 1.2)
column_widths.append(max_length)
wb = load_workbook(file_path)
if isinstance(sheet_name, int):
sheet_name = wb.sheetnames[sheet_name]
worksheet = wb[sheet_name]
for i, column_width in enumerate(column_widths):
column = get_column_letter(i+1)
worksheet.column_dimensions[column].width = column_width
wb.save(file_path)
When this came up for me, I just did everything I wanted to do with openpyxl, saved the workbook, and opened it again with pywin32. Pywin32 has autofit built in without having to make a bunch of rules/conditions.
Edit: I should note that pywin32 only works with Windows.
from win32com.client import Dispatch
excel = Dispatch('Excel.Application')
wb = excel.Workbooks.Open("excelFile.xlsx")
excel.Worksheets(1).Activate()
excel.ActiveSheet.Columns.AutoFit()
wb.Save()
wb.Close()
excel.Quit()
I did add a rule, however, because I had one text column that had some long values I didn't need to show. I limited any column to 75 characters.
excel = Dispatch('Excel.Application')
wb = excel.Workbooks.Open("excelFile.xlsx")
excel.Worksheets(1).Activate()
excel.ActiveSheet.Columns.AutoFit()
for col in excel.ActiveSheet.Columns:
if col.ColumnWidth > 75:
col.ColumnWidth = 75
wb.Save()
wb.Close()
excel.Quit()
Just insert the below line of code in your file
# Imorting the necessary modules
try:
from openpyxl.cell import get_column_letter
except ImportError:
from openpyxl.utils import get_column_letter
from openpyxl.utils import column_index_from_string
from openpyxl import load_workbook
import openpyxl
from openpyxl import Workbook
for column_cells in sheet.columns:
new_column_length = max(len(str(cell.value)) for cell in column_cells)
new_column_letter = (get_column_letter(column_cells[0].column))
if new_column_length > 0:
sheet.column_dimensions[new_column_letter].width = new_column_length*1.23
Here is an answer for Python 3.8 and OpenPyXL 3.0.0.
I tried to avoid using the get_column_letter function but failed.
This solution uses the newly introduced assignment expressions aka "walrus operator":
import openpyxl
from openpyxl.utils import get_column_letter
workbook = openpyxl.load_workbook("myxlfile.xlsx")
worksheet = workbook["Sheet1"]
MIN_WIDTH = 10
for i, column_cells in enumerate(worksheet.columns, start=1):
width = (
length
if (length := max(len(str(cell_value) if (cell_value := cell.value) is not None else "")
for cell in column_cells)) >= MIN_WIDTH
else MIN_WIDTH
)
worksheet.column_dimensions[get_column_letter(i)].width = width
Since in openpyxl 2.6.1, it requires the column letter, not the column number, when setting the width.
for column in sheet.columns:
length = max(len(str(cell.value)) for cell in column)
length = length if length <= 16 else 16
sheet.column_dimensions[column[0].column_letter].width = length
I need advice on setting styles in Openpyxl.
I see that the NumberFormat of a cell can be set, but I also require setting of font colors and attributes (bold etc). There is a style.py class but it seems I can't set the style attribute of a cell, and I don't really want to start tinkering with the openpyxl source code.
Has anyone found a solution to this?
As of openpyxl version 1.5.7, I have successfully applied the following worksheet style options...
from openpyxl.reader.excel import load_workbook
from openpyxl.workbook import Workbook
from openpyxl.styles import Color, Fill
from openpyxl.cell import Cell
# Load the workbook...
book = load_workbook('foo.xlsx')
# define ws here, in this case I pick the first worksheet in the workbook...
# NOTE: openpyxl has other ways to select a specific worksheet (i.e. by name
# via book.get_sheet_by_name('someWorksheetName'))
ws = book.worksheets[0]
## ws is a openpypxl worksheet object
_cell = ws.cell('C1')
# Font properties
_cell.style.font.color.index = Color.GREEN
_cell.style.font.name = 'Arial'
_cell.style.font.size = 8
_cell.style.font.bold = True
_cell.style.alignment.wrap_text = True
# Cell background color
_cell.style.fill.fill_type = Fill.FILL_SOLID
_cell.style.fill.start_color.index = Color.DARKRED
# You should only modify column dimensions after you have written a cell in
# the column. Perfect world: write column dimensions once per column
#
ws.column_dimensions["C"].width = 60.0
FYI, you can find the names of the colors in openpyxl/style.py... I sometimes I patch in extra colors from the X11 color names
class Color(HashableObject):
"""Named colors for use in styles."""
BLACK = 'FF000000'
WHITE = 'FFFFFFFF'
RED = 'FFFF0000'
DARKRED = 'FF800000'
BLUE = 'FF0000FF'
DARKBLUE = 'FF000080'
GREEN = 'FF00FF00'
DARKGREEN = 'FF008000'
YELLOW = 'FFFFFF00'
DARKYELLOW = 'FF808000'
For openpyxl version 2.4.1 and above use below code to set font color:
from openpyxl.styles import Font
from openpyxl.styles.colors import Color
ws1['A1'].font = Font(color = "FF0000")
hex codes for various colors can be found at:
http://dmcritchie.mvps.org/excel/colors.htm
As of openpyxl 2.0, styles are immutable.
If you have a cell, you can (e.g.) set bold text by:
cell.style = cell.style.copy(font=cell.style.font.copy(bold=True))
Yes, this is annoying.
As of openpyxl 2.0, setting cell styles is done by creating new style objects and by assigning them to properties of a cell.
There are several style objects: Font, PatternFill, Border, and Alignment. See the doc.
To change a style property of a cell, first you either have to copy the existing style object from the cell and change the value of the property or you have to create a new style object with the desired settings. Then, assign the new style object to the cell.
Example of setting the font to bold and italic of cell A1:
from openpyxl import Workbook
from openpyxl.styles import Font
# Create workbook
wb = Workbook()
# Select active sheet
ws = wb.active()
# Select cell A1
cell = ws['A1']
# Make the text of the cell bold and italic
cell.font = cell.font.copy(bold=True, italic=True)
This seems like a feature that has changed a few times. I am using openpyxl 2.5.0, and I was able to set the strike-through option this way:
new_font = copy(cell.font)
new_font.strike = True
cell.font = new_font
It seems like earlier versions (1.9 to 2.4?) had a copy method on the font that is now deprecated and raises a warning:
cell.font = cell.font.copy(strike=True)
Versions up to 1.8 had mutable fonts, so you could just do this:
cell.font.strike=True
That now raises an error.
New 2021 Updated Way of Changing FONT in OpenPyXl:
sheet.cell.font = Font(size=23, underline='single', color='FFBB00', bold=True, italic=True)
Full Code:
import openpyxl # Connect the library
from openpyxl import Workbook
from openpyxl.styles import PatternFill # Connect cell styles
from openpyxl.workbook import Workbook
from openpyxl.styles import Font, Fill # Connect styles for text
from openpyxl.styles import colors # Connect colors for text and cells
wb = openpyxl.Workbook() # Create book
work_sheet = wb.create_sheet(title='Testsheet') # Created a sheet with a name and made it active
work_sheet['A1'] = 'Test text'
work_sheet_a1 = work_sheet['A5'] # Created a variable that contains cell A1 with the existing text
work_sheet_a1.font = Font(size=23, underline='single', color='FFBB00', bold=True,
italic=True) # We apply the following parameters to the text: size - 23, underline, color = FFBB00 (text color is specified in RGB), bold, oblique. If we do not need a bold font, we use the construction: bold = False. We act similarly if we do not need an oblique font: italic = False.
# Important:
# if necessary, the possibility of using standard colors is included in the styles, but the code in this case will look different:
work_sheet_a1.font = Font(size=23, underline='single', color=colors.RED, bold=True,
italic=True) # what color = colors.RED — color prescribed in styles
work_sheet_a1.fill = PatternFill(fill_type='solid', start_color='ff8327',
end_color='ff8327') # This code allows you to do design color cells
Like openpyxl doc said:
This is an open source project, maintained by volunteers in their spare time. This may well mean that particular features or functions that you would like are missing.
I checked openpyxl source code, found that:
Till openpyxl 1.8.x, styles are mutable. Their attribute can be assigned directly like this:
from openpyxl.workbook import Workbook
from openpyxl.style import Color
wb = Workbook()
ws = wb.active
ws['A1'].style.font.color.index = Color.RED
However from of openpyxl 1.9, styles are immutable.
Styles are shared between objects and once they have been assigned they cannot be changed. This stops unwanted side-effects such as changing the style for lots of cells when instead of only one.
To create a new style object, you can assign it directly, or copy one from an existing cell's style with new attributes, answer to the question as an example(forgive my Chinese English):
from openpyxl.styles import colors
from openpyxl.styles import Font, Color
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
a1 = ws['A1']
d4 = ws['D4']
# create a new style with required attributes
ft_red = Font(color=colors.RED)
a1.font = ft_red
# you can also do it with function copy
ft_red_bold = ft_red.copy(bold=True)
# you can copy from a cell's style with required attributes
ft_red_sigle_underline = a1.font.copy(underline="single")
d4.font = ft_red_bold
# apply style to column E
col_e = ws.column_dimensions['E']
col_e.font = ft_red_sigle_underline
A cell' style contains these attributes: font, fill, border, alignment, protection and number_format. Check openpyxl.styles.
They are similar and should be created as an object, except number_format, its value is string type.
Some pre-defined number formats are available, number formats can also be defined in string type. Check openpyxl.styles.numbers.
from openpyxl.styles import numbers
# use pre-defined values
ws.cell['T49'].number_format = numbers.FORMAT_GENERAL
ws.cell(row=2, column=4).number_format = numbers.FORMAT_DATE_XLSX15
# use strings
ws.cell['T57'].number_format = 'General'
ws.cell(row=3, column=5).number_format = 'd-mmm-yy'
ws.cell['E5'].number_format = '0.00'
ws.cell['E50'].number_format = '0.00%'
ws.cell['E100'].number_format = '_ * #,##0_ ;_ * -#,##0_ ;_ * "-"??_ ;_ #_ '
As of openpyxl-1.7.0 you can do this too:
cell.style.fill.start_color.index = "FF124191"
I've got a couple of helper functions which set a style on a given cell - things like headers, footers etc.
You can define a common style then you can apply the same to any cell or range.
Define Style:
Apply on a cell.
This worked for me (font colour + bold font):
from openpyxl.styles import colors
from openpyxl.styles import Font, Color
from openpyxl import Workbook
wb = Workbook()
ws = wb['SheetName']
ws.cell(row_number,column_number).font = Font(color = "0000FF",bold = True)