openpyxl - adjust column width size - python

I have following script which is converting a CSV file to an XLSX file, but my column size is very narrow. Each time I have to drag them with mouse to read data. Does anybody know how to set column width in openpyxl?
Here is the code I am using.
#!/usr/bin/python2.6
import csv
from openpyxl import Workbook
from openpyxl.cell import get_column_letter
f = open('users_info_cvs.txt', "rU")
csv.register_dialect('colons', delimiter=':')
reader = csv.reader(f, dialect='colons')
wb = Workbook()
dest_filename = r"account_info.xlsx"
ws = wb.worksheets[0]
ws.title = "Users Account Information"
for row_index, row in enumerate(reader):
for column_index, cell in enumerate(row):
column_letter = get_column_letter((column_index + 1))
ws.cell('%s%s'%(column_letter, (row_index + 1))).value = cell
wb.save(filename = dest_filename)

You could estimate (or use a mono width font) to achieve this. Let's assume data is a nested array like
[['a1','a2'],['b1','b2']]
We can get the max characters in each column. Then set the width to that. Width is exactly the width of a monospace font (if not changing other styles at least). Even if you use a variable width font it is a decent estimation. This will not work with formulas.
from openpyxl.utils import get_column_letter
column_widths = []
for row in data:
for i, cell in enumerate(row):
if len(column_widths) > i:
if len(cell) > column_widths[i]:
column_widths[i] = len(cell)
else:
column_widths += [len(cell)]
for i, column_width in enumerate(column_widths,1): # ,1 to start at 1
worksheet.column_dimensions[get_column_letter(i)].width = column_width
A bit of a hack but your reports will be more readable.

My variation of Bufke's answer. Avoids a bit of branching with the array and ignores empty cells / columns.
Now fixed for non-string cell values.
ws = your current worksheet
dims = {}
for row in ws.rows:
for cell in row:
if cell.value:
dims[cell.column] = max((dims.get(cell.column, 0), len(str(cell.value))))
for col, value in dims.items():
ws.column_dimensions[col].width = value
As of openpyxl version 3.0.3 you need to use
dims[cell.column_letter] = max((dims.get(cell.column_letter, 0), len(str(cell.value))))
as the openpyxl library will raise a TypeError if you pass column_dimensions a number instead of a column letter, everything else can stay the same.

Even more pythonic way to set the width of all columns that works at least in openpyxl version 2.4.0:
for column_cells in worksheet.columns:
length = max(len(as_text(cell.value)) for cell in column_cells)
worksheet.column_dimensions[column_cells[0].column].width = length
The as_text function should be something that converts the value to a proper length string, like for Python 3:
def as_text(value):
if value is None:
return ""
return str(value)

With openpyxl 3.0.3 the best way to modify the columns is with the DimensionHolder object, which is a dictionary that maps each column to a ColumnDimension object.
ColumnDimension can get parameters as bestFit, auto_size (which is an alias of bestFit) and width.
Personally, auto_size doesn't work as expected and I had to use width and figured out that the best width for the column is len(cell_value) * 1.23.
To get the value of each cell it's necessary to iterate over each one, but I personally didn't use it because in my project I just had to write worksheets, so I got the longest string in each column directly on my data.
The example below just shows how to modify the column dimensions:
import openpyxl
from openpyxl.worksheet.dimensions import ColumnDimension, DimensionHolder
from openpyxl.utils import get_column_letter
wb = openpyxl.load_workbook("Example.xslx")
ws = wb["Sheet1"]
dim_holder = DimensionHolder(worksheet=ws)
for col in range(ws.min_column, ws.max_column + 1):
dim_holder[get_column_letter(col)] = ColumnDimension(ws, min=col, max=col, width=20)
ws.column_dimensions = dim_holder

I have a problem with merged_cells and autosize not work correctly, if you have the same problem, you can solve with the next code:
for col in worksheet.columns:
max_length = 0
column = col[0].column # Get the column name
for cell in col:
if cell.coordinate in worksheet.merged_cells: # not check merge_cells
continue
try: # Necessary to avoid error on empty cells
if len(str(cell.value)) > max_length:
max_length = len(cell.value)
except:
pass
adjusted_width = (max_length + 2) * 1.2
worksheet.column_dimensions[column].width = adjusted_width

A slight improvement of the above accepted answer, that I think is more pythonic (asking for forgiveness is better than asking for permission)
column_widths = []
for row in workSheet.iter_rows():
for i, cell in enumerate(row):
try:
column_widths[i] = max(column_widths[i], len(str(cell.value)))
except IndexError:
column_widths.append(len(str(cell.value)))
for i, column_width in enumerate(column_widths):
workSheet.column_dimensions[get_column_letter(i + 1)].width = column_width

We can convert numbers to their ASCII values and give it to column_dimension parameter
import openpyxl as xl
work_book = xl.load_workbook('file_location')
sheet = work_book['Sheet1']
column_number = 2
column = str(chr(64 + column_number))
sheet.column_dimensions[column].width = 20
work_book.save('file_location')

Here is a more general, simplified solution for users new to the topic (Not specified for the question).
If you want to change the width or the height of cells in openpyxl (Version 3.0.9), you can do it simply by assigning the attributes of the cells with row_dimensions or column_dimensions.
import openpyxl
wb = openpyxl.Workbook()
sheet = wb["Sheet"]
sheet["A1"] = "Tall row"
sheet["B2"] = "Wide column"
# Change height of row A1
sheet.row_dimensions[1].height = 100
# Change width of column B
sheet.column_dimensions["B"].width = 50
wb.save("StackOverflow.xlsx")

This is my version referring #Virako 's code snippet
def adjust_column_width_from_col(ws, min_row, min_col, max_col):
column_widths = []
for i, col in \
enumerate(
ws.iter_cols(min_col=min_col, max_col=max_col, min_row=min_row)
):
for cell in col:
value = cell.value
if value is not None:
if isinstance(value, str) is False:
value = str(value)
try:
column_widths[i] = max(column_widths[i], len(value))
except IndexError:
column_widths.append(len(value))
for i, width in enumerate(column_widths):
col_name = get_column_letter(min_col + i)
value = column_widths[i] + 2
ws.column_dimensions[col_name].width = value
And how to use is as follows,
adjust_column_width_from_col(ws, 1,1, ws.max_column)

All the above answers are generating an issue which is that col[0].column is returning number while worksheet.column_dimensions[column] accepts only character such as 'A', 'B', 'C' in place of column. I've modified #Virako's code and it is working fine now.
import re
import openpyxl
..
for col in _ws.columns:
max_lenght = 0
print(col[0])
col_name = re.findall('\w\d', str(col[0]))
col_name = col_name[0]
col_name = re.findall('\w', str(col_name))[0]
print(col_name)
for cell in col:
try:
if len(str(cell.value)) > max_lenght:
max_lenght = len(cell.value)
except:
pass
adjusted_width = (max_lenght+2)
_ws.column_dimensions[col_name].width = adjusted_width

This is a dirty fix. But openpyxl actually supports auto_fit. But there is no method to access the property.
import openpyxl
from openpyxl.utils import get_column_letter
wb = openpyxl.load_workbook("Example.xslx")
ws = wb["Sheet1"]
for i in range(1, ws.max_column+1):
ws.column_dimensions[get_column_letter(i)].bestFit = True
ws.column_dimensions[get_column_letter(i)].auto_size = True

Another approach without storing any state could be like this:
from itertools import chain
# Using `ws` as the Worksheet
for cell in chain.from_iterable(ws.iter_cols()):
if cell.value:
ws.column_dimensions[cell.column_letter].width = max(
ws.column_dimensions[cell.column_letter].width,
len(f"{cell.value}"),
)

I had to change #User3759685 above answer to this when the openpxyl updated. I was getting an error. Well #phihag reported this in the comments as well
for column_cells in ws.columns:
new_column_length = max(len(as_text(cell.value)) for cell in column_cells)
new_column_letter = (openpyxl.utils.get_column_letter(column_cells[0].column))
if new_column_length > 0:
ws.column_dimensions[new_column_letter].width = new_column_length + 1

Compiling and applying multiple suggestions above, and extending merged cells detection to the horizontally merged cells only, I could offer this code:
def adjust_width(ws):
"""
Adjust width of the columns
#param ws: worksheet
#return: None
"""
def is_merged_horizontally(cell):
"""
Checks if cell is merged horizontally with an another cell
#param cell: cell to check
#return: True if cell is merged horizontally with an another cell, else False
"""
cell_coor = cell.coordinate
if cell_coor not in ws.merged_cells:
return False
for rng in ws.merged_cells.ranges:
if cell_coor in rng and len(list(rng.cols)) > 1:
return True
return False
for col_number, col in enumerate(ws.columns, start=1):
col_letter = get_column_letter(col_number)
max_length = max(
len(str(cell.value or "")) for cell in col if not is_merged_horizontally(cell)
)
adjusted_width = (max_length + 2) * 0.95
ws.column_dimensions[col_letter].width = adjusted_width

After update from openpyxl2.5.2a to latest 2.6.4 (final version for python 2.x support), I got same issue in configuring the width of a column.
Basically I always calculate the width for a column (dims is a dict maintaining each column width):
dims[cell.column] = max((dims.get(cell.column, 0), len(str(cell.value))))
Afterwards I am modifying the scale to something shortly bigger than original size, but now you have to give the "Letter" value of a column and not anymore a int value (col below is the value and is translated to the right letter):
worksheet.column_dimensions[get_column_letter(col)].width = value +1
This will fix the visible error and assigning the right width to your column ;)
Hope this help.

I made a function that is very fast with large Excel files because it uses pandas.read_excel
import pandas as pd
from openpyxl import load_workbook
from openpyxl.utils import get_column_letter
def auto_adjust_column_width(file_path, sheet_name=0):
column_widths = []
df = pd.read_excel(file_path, sheet_name=sheet_name, header=None)
for col in df.columns:
max_length = int(df[col].astype(str).str.len().max() * 1.2)
column_widths.append(max_length)
wb = load_workbook(file_path)
if isinstance(sheet_name, int):
sheet_name = wb.sheetnames[sheet_name]
worksheet = wb[sheet_name]
for i, column_width in enumerate(column_widths):
column = get_column_letter(i+1)
worksheet.column_dimensions[column].width = column_width
wb.save(file_path)

When this came up for me, I just did everything I wanted to do with openpyxl, saved the workbook, and opened it again with pywin32. Pywin32 has autofit built in without having to make a bunch of rules/conditions.
Edit: I should note that pywin32 only works with Windows.
from win32com.client import Dispatch
excel = Dispatch('Excel.Application')
wb = excel.Workbooks.Open("excelFile.xlsx")
excel.Worksheets(1).Activate()
excel.ActiveSheet.Columns.AutoFit()
wb.Save()
wb.Close()
excel.Quit()
I did add a rule, however, because I had one text column that had some long values I didn't need to show. I limited any column to 75 characters.
excel = Dispatch('Excel.Application')
wb = excel.Workbooks.Open("excelFile.xlsx")
excel.Worksheets(1).Activate()
excel.ActiveSheet.Columns.AutoFit()
for col in excel.ActiveSheet.Columns:
if col.ColumnWidth > 75:
col.ColumnWidth = 75
wb.Save()
wb.Close()
excel.Quit()

Just insert the below line of code in your file
# Imorting the necessary modules
try:
from openpyxl.cell import get_column_letter
except ImportError:
from openpyxl.utils import get_column_letter
from openpyxl.utils import column_index_from_string
from openpyxl import load_workbook
import openpyxl
from openpyxl import Workbook
for column_cells in sheet.columns:
new_column_length = max(len(str(cell.value)) for cell in column_cells)
new_column_letter = (get_column_letter(column_cells[0].column))
if new_column_length > 0:
sheet.column_dimensions[new_column_letter].width = new_column_length*1.23

Here is an answer for Python 3.8 and OpenPyXL 3.0.0.
I tried to avoid using the get_column_letter function but failed.
This solution uses the newly introduced assignment expressions aka "walrus operator":
import openpyxl
from openpyxl.utils import get_column_letter
workbook = openpyxl.load_workbook("myxlfile.xlsx")
worksheet = workbook["Sheet1"]
MIN_WIDTH = 10
for i, column_cells in enumerate(worksheet.columns, start=1):
width = (
length
if (length := max(len(str(cell_value) if (cell_value := cell.value) is not None else "")
for cell in column_cells)) >= MIN_WIDTH
else MIN_WIDTH
)
worksheet.column_dimensions[get_column_letter(i)].width = width

Since in openpyxl 2.6.1, it requires the column letter, not the column number, when setting the width.
for column in sheet.columns:
length = max(len(str(cell.value)) for cell in column)
length = length if length <= 16 else 16
sheet.column_dimensions[column[0].column_letter].width = length

Related

How can I highlight in one color the cells that changed for a same row between two dataframes and in another color what is new? Different Shape

I tried many functions and tried to apply existing solutions to get the output I want, yet I seems not to be able to get an excel output at the end that keeps the formatting I try to apply.
It seems that all the function existing in pandas uses only identically labelled indexes, or files of the same shape, in my situation the shape of the two files are (757,26) for let's say file1 and (688,39) for file 2, the first 26 columns are labelled the same way for file1 and file2.
is there a way to merge these two files, highlight the differences as indicated in the title, and create an excel output with the formatting still present?
Here is what I tried:
import pandas as pd
import numpy as np
import openpyxl
from openpyxl.utils.dataframe import dataframe_to_rows
from openpyxl import Workbook
import pandas.io.formats.style as style
dfcurr=pd.read_excel(r'IPH4201P2 - DRATracker_Current.xlsx')
dfprev=pd.read_excel(r'IPH4201P2 - DRATracker_Previous.xlsx')
dfprev=dfprev.loc[(dfprev['Subject'].str.contains('M'))|(dfprev['Subject'].str.contains('S'))]
dfprev=dfprev.reset_index()
dfprev=dfprev.drop(columns='index')
df_diff=pd.merge(dfcurr,dfprev,how='left',indicator=True)
common_columns = df_diff.columns.intersection(dfprev.columns)
compare_df = df_diff[common_columns].eq(dfprev[common_columns])
compare_df.to_excel('comp.xlsx')
# Convert dataframe to string
df_diff = df_diff.astype(str)
def highlight_diff(data, compare):
if type(data) != pd.DataFrame:
data = pd.DataFrame(data)
if type(compare) != pd.DataFrame:
compare = pd.DataFrame(compare)
result = []
for col in data.columns:
if col in compare.columns and (data[col] != compare[col]).any():
result.append('background-color: #DAEEF3')
elif col not in compare.columns:
result.append('background-color: #E4DFEC')
else:
result.append('background-color: white')
return result
# Create a new workbook and add a new worksheet
wb = Workbook()
ws = wb.active
# Write the dataframe to the worksheet
for r in dataframe_to_rows(df_diff.style.apply(highlight_diff, compare=compare_df).data, index=False, header=True):
ws.append(r)
# Save the workbook
wb.save('Merged_style.xlsx')
However, I do not get an output with the style applied; no cells are highlighted in the color I want them to be highlighted in.
Edit:
I tried a different approach to highlight the cells in the excel, the function used for this approach comes from here:
import pandas as pd
import numpy as np
import openpyxl
import pandas.io.formats.style as style
dfcurr=pd.read_excel(r'IPH4201P2 - DRATracker_Current.xlsx')
dfprev=pd.read_excel(r'IPH4201P2 - DRATracker_Previous.xlsx')
dfprev=dfprev.loc[(dfprev['Subject'].str.contains('M'))|(dfprev['Subject'].str.contains('S'))]
dfprev=dfprev.reset_index()
dfprev=dfprev.drop(columns='index')
new='background-color: #DAEEF3'
change='background-color: #E4DFEC'
df_diff=pd.merge(dfcurr,dfprev,on=['Subject','Visit','Visit Date','Site\nID','Cohort','Pathology','Clinical\nStage At\nScreening','TNMBA at\nScreening'],how='left',indicator=True)
for col in df_diff.columns:
if '_y' in col:
del df_diff[col]
elif 'Unnamed: 1' in col:
del df_diff[col]
elif '_x' in col:
df_diff.columns=df_diff.columns.str.rstrip('_x')
def highlight_diff(data, other, color='#DAEEF3'):
# Define html attribute
attr = 'background-color: {}'.format(color)
# Where data != other set attribute
return pd.DataFrame(np.where(data.ne(other), attr, ''),
index=data.index, columns=data.columns)
# Set axis=None so it passes the entire frame
df_diff=df_diff.style.apply(highlight_diff, axis=None, other=dfprev)
print(type(df_diff))
df_diff.to_excel('Diff.xlsx',engine='openpyxl',index=0)
This new method provides me with an excel file where the style is applied, how can I update it to apply the color #DAEEF3 to rows in df_diff where if the Subject, Visit and Visit Date are not present in the dataframe dfprev, and apply the color #E4DFEC to cells that differs between the two files for matching Subject, Visit and Visit Date?
This code is isn't doing anything...
df_diff.style.apply(highlight_diff, compare=compare_df).data
df_diff.style creates a Styler object.
.apply applies a function to that Styler for it to attach relevant HTML styling which it stores as a mapping in the Styler context.
.data the just retrieves the original DataFrame object that you created the Styler object with and it has nothing to do with those HTML styling contexts you created for the Styler, so you are effectively discarding them with this final .data addition.
Styler has its own to_excel method which interprets some of that HTML styling context and converts it to Excel cell coloring and formatting.
After asking around to people I know that had to do something similar, here is the final code that produces the expected output:
import pandas as pd
from openpyxl.styles import PatternFill
from openpyxl import load_workbook
#DATA FILES##################################################
#Set below to False to copy comments manually and keep the comment formatting.
copy_comments_automatically = True
#Update folderPath
folderPath = "C:/Users/G.Tielemans/OneDrive - Medpace/Documents/Innate/Script/DRA/"
#File names must match and files must be closed when running
dfcurr = pd.read_excel(folderPath + "IPH4201P2 - DRATracker_Current.xlsx")
dfprev = pd.read_excel(folderPath + "IPH4201P2 - DRATracker_Previous.xlsx")
#############################################################
#LOADING DATA################################################
dfprev = dfprev.loc[(dfprev['Subject'].str.contains('M'))|(dfprev['Subject'].str.contains('S'))]
dfprev = dfprev.reset_index()
dfprev = dfprev.drop(columns='index')
dfprevComments = dfprev.iloc[:, 29:]
#############################################################
#NEW LINES###################################################
def highlightNewLines(linecurr):
currSubject = linecurr["Subject"]
currVisit = linecurr["Visit"]
currVisitDate = linecurr["Visit Date"]
for index, row in dfprev.iterrows():
if currSubject == row["Subject"] and currVisit == row["Visit"] and currVisitDate == row["Visit Date"]:
return True
return False
dfcurr["Duplicate?"] = dfcurr.apply(lambda row: highlightNewLines(row), axis = 1)
#############################################################
#FIND UPDATES################################################
dfDupes = dfcurr[dfcurr["Duplicate?"] == True]
dfDupeslen = len(dfDupes)
#indexes of new lines to paste at bottom of file and color
indexes = dfcurr[dfcurr["Duplicate?"] == False].index
dfDupes = dfDupes.drop("Duplicate?", axis = 1)
dfDupes = dfDupes.reset_index(drop=True)
dfprev = dfprev.iloc[:,0:26]
dfprev = dfprev.reset_index(drop=True)
difference = dfDupes[dfDupes!=dfprev]
#############################################################
#ATTACH NEW FINDINGS AND PASTE MEDPACE COMMENT COLUMNS#######
newfindings = dfcurr.loc[indexes]
newfindings = newfindings.drop("Duplicate?", axis = 1)
dfDupes = pd.concat([dfDupes, newfindings])
dfDupes = dfDupes.reset_index(drop=True)
dflen = len(dfDupes)
if copy_comments_automatically:
dfDupes = pd.concat([dfDupes, dfprevComments], axis=1)
#############################################################
#COLORING####################################################
dfDupes.to_excel(folderPath + "IPH4201P2 - DRATracker_Output.xlsx", index = False)
wb = load_workbook(folderPath + "IPH4201P2 - DRATracker_Output.xlsx")
ws = wb.active
fillred = PatternFill(start_color="ffc7ce", end_color="ffc7ce", fill_type = "solid")
fillblue = PatternFill(start_color="99ccff", end_color="99ccff", fill_type = "solid")
for row in range(len(difference)):
for column in range(len(difference.columns)):
if pd.isnull(difference.iloc[row, column]) == False:
ws.cell(row+2, column+1).fill = fillred
for row in range(dfDupeslen, dflen):
for column in [2,5,6]:
ws.cell(row+2, column).fill = fillblue
wb.save(folderPath + "IPH4201P2 - DRATracker_Output.xlsx")
#############################################################
print("Done")

adjust column width size using openpyxl

I'm facing trouble in adjusting column width of the below excel file. I'm using this code.
from openpyxl.utils import get_column_letter
ws.delete_cols(1) #remove col 'A' which has pandas index no.
for column in ws.iter_cols():
name = get_column_letter(column[0].column)
new_col_length = max(len(str(cell.value)) for cell in column)
#ws.column_dimensions[name].bestFit = True #I tried this but the result is same
ws.column_dimensions[name].width = new_col_length
excel sheet:
the output im getting.:
Something like this should manage the deletion of column A using Openpyxl
from openpyxl import load_workbook
from openpyxl.utils import get_column_letter
path = 'col_width.xlsx'
wb = load_workbook(path)
ws = wb['Sheet1']
remerge_cells_list = []
# Remove existing merged cells, and
# Add the calculated new cell coords to remerge_cells_list to re-add after
# the deletion of column A.
for unmerge in range(len(ws.merged_cells.ranges)):
current_merge = ws.merged_cells.ranges[0]
new_min_col = get_column_letter(current_merge.min_col-1)
new_max_col = get_column_letter(current_merge.max_col-1)
remerge_cells_list.append(new_min_col + str(current_merge.min_row) + ':'
+ new_max_col + str(current_merge.max_row))
print("Removing merge: " + current_merge.coord)
ws.unmerge_cells(current_merge.coord)
print("\nDeleting column A\n")
ws.delete_cols(1) #remove col 'A' which has pandas index no.
# Set the column width dimenions
for column in ws.iter_cols():
name = get_column_letter(column[0].column)
new_col_length = max(len(str(cell.value)) for cell in column)
# ws.column_dimensions[name].bestFit = True #I tried this but the result is same
ws.column_dimensions[name].width = new_col_length+2 # Added a extra bit for padding
# Re-merge the cells from the remerge_cells_list
# Don't think it matters if this is done before or after resizing the columns
for merge in remerge_cells_list:
print("Add adjusted cell merge: " + merge)
ws.merge_cells(merge)
wb.save('col_width_out.xlsx')
After many hours of research finally, I found it.
NOTE : In the below code, sheet is the worksheet name. Usually in the documentation, we can see it as ws. Please don't forget to change the worksheet name.
# Imorting the necessary modules
try:
from openpyxl.cell import get_column_letter
except ImportError:
from openpyxl.utils import get_column_letter
from openpyxl.utils import column_index_from_string
from openpyxl import load_workbook
import openpyxl
from openpyxl import Workbook
for column_cells in sheet.columns:
new_column_length = max(len(str(cell.value)) for cell in column_cells)
new_column_letter = (get_column_letter(column_cells[0].column))
if new_column_length > 0:
sheet.column_dimensions[new_column_letter].width = new_column_length*1.23
UPDATE : This code doesn't work for all, but don't hesitate to try it..

Custom Excel column using pandas [duplicate]

I am being asked to generate some Excel reports. I am currently using pandas quite heavily for my data, so naturally I would like to use the pandas.ExcelWriter method to generate these reports. However the fixed column widths are a problem.
The code I have so far is simple enough. Say I have a dataframe called df:
writer = pd.ExcelWriter(excel_file_path, engine='openpyxl')
df.to_excel(writer, sheet_name="Summary")
I was looking over the pandas docs, and I don't really see any options to set column widths. Is there a trick to make it such that the columns auto-adjust to the data? Or is there something I can do after the fact to the xlsx file to adjust the column widths?
(I am using the OpenPyXL library, and generating .xlsx files - if that makes any difference.)
Inspired by user6178746's answer, I have the following:
# Given a dict of dataframes, for example:
# dfs = {'gadgets': df_gadgets, 'widgets': df_widgets}
writer = pd.ExcelWriter(filename, engine='xlsxwriter')
for sheetname, df in dfs.items(): # loop through `dict` of dataframes
df.to_excel(writer, sheet_name=sheetname) # send df to writer
worksheet = writer.sheets[sheetname] # pull worksheet object
for idx, col in enumerate(df): # loop through all columns
series = df[col]
max_len = max((
series.astype(str).map(len).max(), # len of largest item
len(str(series.name)) # len of column name/header
)) + 1 # adding a little extra space
worksheet.set_column(idx, idx, max_len) # set column width
writer.save()
Dynamically adjust all the column lengths
writer = pd.ExcelWriter('/path/to/output/file.xlsx')
df.to_excel(writer, sheet_name='sheetName', index=False, na_rep='NaN')
for column in df:
column_length = max(df[column].astype(str).map(len).max(), len(column))
col_idx = df.columns.get_loc(column)
writer.sheets['sheetName'].set_column(col_idx, col_idx, column_length)
writer.save()
Manually adjust a column using Column Name
col_idx = df.columns.get_loc('columnName')
writer.sheets['sheetName'].set_column(col_idx, col_idx, 15)
Manually adjust a column using Column Index
writer.sheets['sheetName'].set_column(col_idx, col_idx, 15)
In case any of the above is failing with
AttributeError: 'Worksheet' object has no attribute 'set_column'
make sure to install xlsxwriter:
pip install xlsxwriter
For a more comprehensive explanation you can read the article How to Auto-Adjust the Width of Excel Columns with Pandas ExcelWriter on TDS.
I'm posting this because I just ran into the same issue and found that the official documentation for Xlsxwriter and pandas still have this functionality listed as unsupported. I hacked together a solution that solved the issue i was having. I basically just iterate through each column and use worksheet.set_column to set the column width == the max length of the contents of that column.
One important note, however. This solution does not fit the column headers, simply the column values. That should be an easy change though if you need to fit the headers instead. Hope this helps someone :)
import pandas as pd
import sqlalchemy as sa
import urllib
read_server = 'serverName'
read_database = 'databaseName'
read_params = urllib.quote_plus("DRIVER={SQL Server};SERVER="+read_server+";DATABASE="+read_database+";TRUSTED_CONNECTION=Yes")
read_engine = sa.create_engine("mssql+pyodbc:///?odbc_connect=%s" % read_params)
#Output some SQL Server data into a dataframe
my_sql_query = """ SELECT * FROM dbo.my_table """
my_dataframe = pd.read_sql_query(my_sql_query,con=read_engine)
#Set destination directory to save excel.
xlsFilepath = r'H:\my_project' + "\\" + 'my_file_name.xlsx'
writer = pd.ExcelWriter(xlsFilepath, engine='xlsxwriter')
#Write excel to file using pandas to_excel
my_dataframe.to_excel(writer, startrow = 1, sheet_name='Sheet1', index=False)
#Indicate workbook and worksheet for formatting
workbook = writer.book
worksheet = writer.sheets['Sheet1']
#Iterate through each column and set the width == the max length in that column. A padding length of 2 is also added.
for i, col in enumerate(my_dataframe.columns):
# find length of column i
column_len = my_dataframe[col].astype(str).str.len().max()
# Setting the length if the column header is larger
# than the max column value length
column_len = max(column_len, len(col)) + 2
# set the column length
worksheet.set_column(i, i, column_len)
writer.save()
There is a nice package that I started to use recently called StyleFrame.
it gets DataFrame and lets you to style it very easily...
by default the columns width is auto-adjusting.
for example:
from StyleFrame import StyleFrame
import pandas as pd
df = pd.DataFrame({'aaaaaaaaaaa': [1, 2, 3],
'bbbbbbbbb': [1, 1, 1],
'ccccccccccc': [2, 3, 4]})
excel_writer = StyleFrame.ExcelWriter('example.xlsx')
sf = StyleFrame(df)
sf.to_excel(excel_writer=excel_writer, row_to_add_filters=0,
columns_and_rows_to_freeze='B2')
excel_writer.save()
you can also change the columns width:
sf.set_column_width(columns=['aaaaaaaaaaa', 'bbbbbbbbb'],
width=35.3)
UPDATE 1
In version 1.4 best_fit argument was added to StyleFrame.to_excel.
See the documentation.
UPDATE 2
Here's a sample of code that works for StyleFrame 3.x.x
from styleframe import StyleFrame
import pandas as pd
columns = ['aaaaaaaaaaa', 'bbbbbbbbb', 'ccccccccccc', ]
df = pd.DataFrame(data={
'aaaaaaaaaaa': [1, 2, 3, ],
'bbbbbbbbb': [1, 1, 1, ],
'ccccccccccc': [2, 3, 4, ],
}, columns=columns,
)
excel_writer = StyleFrame.ExcelWriter('example.xlsx')
sf = StyleFrame(df)
sf.to_excel(
excel_writer=excel_writer,
best_fit=columns,
columns_and_rows_to_freeze='B2',
row_to_add_filters=0,
)
excel_writer.save()
There is probably no automatic way to do it right now, but as you use openpyxl, the following line (adapted from another answer by user Bufke on how to do in manually) allows you to specify a sane value (in character widths):
writer.sheets['Summary'].column_dimensions['A'].width = 15
By using pandas and xlsxwriter you can do your task, below code will perfectly work in Python 3.x. For more details on working with XlsxWriter with pandas this link might be useful https://xlsxwriter.readthedocs.io/working_with_pandas.html
import pandas as pd
writer = pd.ExcelWriter(excel_file_path, engine='xlsxwriter')
df.to_excel(writer, sheet_name="Summary")
workbook = writer.book
worksheet = writer.sheets["Summary"]
#set the column width as per your requirement
worksheet.set_column('A:A', 25)
writer.save()
I found that it was more useful to adjust the column with based on the column header rather than column content.
Using df.columns.values.tolist() I generate a list of the column headers and use the lengths of these headers to determine the width of the columns.
See full code below:
import pandas as pd
import xlsxwriter
writer = pd.ExcelWriter(filename, engine='xlsxwriter')
df.to_excel(writer, index=False, sheet_name=sheetname)
workbook = writer.book # Access the workbook
worksheet= writer.sheets[sheetname] # Access the Worksheet
header_list = df.columns.values.tolist() # Generate list of headers
for i in range(0, len(header_list)):
worksheet.set_column(i, i, len(header_list[i])) # Set column widths based on len(header)
writer.save() # Save the excel file
At work, I am always writing the dataframes to excel files. So instead of writing the same code over and over, I have created a modulus. Now I just import it and use it to write and formate the excel files. There is one downside though, it takes a long time if the dataframe is extra large.
So here is the code:
def result_to_excel(output_name, dataframes_list, sheet_names_list, output_dir):
out_path = os.path.join(output_dir, output_name)
writerReport = pd.ExcelWriter(out_path, engine='xlsxwriter',
datetime_format='yyyymmdd', date_format='yyyymmdd')
workbook = writerReport.book
# loop through the list of dataframes to save every dataframe into a new sheet in the excel file
for i, dataframe in enumerate(dataframes_list):
sheet_name = sheet_names_list[i] # choose the sheet name from sheet_names_list
dataframe.to_excel(writerReport, sheet_name=sheet_name, index=False, startrow=0)
# Add a header format.
format = workbook.add_format({
'bold': True,
'border': 1,
'fg_color': '#0000FF',
'font_color': 'white'})
# Write the column headers with the defined format.
worksheet = writerReport.sheets[sheet_name]
for col_num, col_name in enumerate(dataframe.columns.values):
worksheet.write(0, col_num, col_name, format)
worksheet.autofilter(0, 0, 0, len(dataframe.columns) - 1)
worksheet.freeze_panes(1, 0)
# loop through the columns in the dataframe to get the width of the column
for j, col in enumerate(dataframe.columns):
max_width = max([len(str(s)) for s in dataframe[col].values] + [len(col) + 2])
# define a max width to not get to wide column
if max_width > 50:
max_width = 50
worksheet.set_column(j, j, max_width)
writerReport.save()
return output_dir + output_name
Combining the other answers and comments and also supporting multi-indices:
def autosize_excel_columns(worksheet, df):
autosize_excel_columns_df(worksheet, df.index.to_frame())
autosize_excel_columns_df(worksheet, df, offset=df.index.nlevels)
def autosize_excel_columns_df(worksheet, df, offset=0):
for idx, col in enumerate(df):
series = df[col]
max_len = max((
series.astype(str).map(len).max(),
len(str(series.name))
)) + 1
worksheet.set_column(idx+offset, idx+offset, max_len)
sheetname=...
df.to_excel(writer, sheet_name=sheetname, freeze_panes=(df.columns.nlevels, df.index.nlevels))
worksheet = writer.sheets[sheetname]
autosize_excel_columns(worksheet, df)
writer.save()
you can solve the problem by calling the following function, where df is the dataframe you want to get the sizes and the sheetname is the sheet in excel where you want the modifications to take place
def auto_width_columns(df, sheetname):
workbook = writer.book
worksheet= writer.sheets[sheetname]
for i, col in enumerate(df.columns):
column_len = max(df[col].astype(str).str.len().max(), len(col) + 2)
worksheet.set_column(i, i, column_len)
import re
import openpyxl
..
for col in _ws.columns:
max_lenght = 0
print(col[0])
col_name = re.findall('\w\d', str(col[0]))
col_name = col_name[0]
col_name = re.findall('\w', str(col_name))[0]
print(col_name)
for cell in col:
try:
if len(str(cell.value)) > max_lenght:
max_lenght = len(cell.value)
except:
pass
adjusted_width = (max_lenght+2)
_ws.column_dimensions[col_name].width = adjusted_width
Yes, there is there is something you can do subsequently to the xlsx file to adjust the column widths.
Use xlwings to autofit columns. It's a pretty simple solution, see the 6 last lines of the example code. The advantage of this procedure is that you don't have to worry about font size, font type or anything else.
Requirement: Excel installation.
import pandas as pd
import xlwings as xw
path = r"test.xlsx"
# Export your dataframe in question.
df = pd._testing.makeDataFrame()
df.to_excel(path)
# Autofit all columns with xlwings.
with xw.App(visible=False) as app:
wb = xw.Book(path)
for ws in wb.sheets:
ws.autofit(axis="columns")
wb.save(path)
wb.close()
Easiest solution is to specify width of column in set_column method.
for worksheet in writer.sheets.values():
worksheet.set_column(0,last_column_value, required_width_constant)
This function works for me, also fixes the index width
def write_to_excel(writer, X, sheet_name, sep_only=False):
#writer=writer object
#X=dataframe
#sheet_name=name of sheet
#sep_only=True:write only as separate excel file, False: write as sheet to the writer object
if sheet_name=="":
print("specify sheet_name!")
else:
X.to_excel(f"{output_folder}{prefix_excel_save}_{sheet_name}.xlsx")
if not sep_only:
X.to_excel(writer, sheet_name=sheet_name)
#fix column widths
worksheet = writer.sheets[sheet_name] # pull worksheet object
for idx, col in enumerate(X.columns): # loop through all columns
series = X[col]
max_len = max((
series.astype(str).map(len).max(), # len of largest item
len(str(series.name)) # len of column name/header
)) + 1 # adding a little extra space
worksheet.set_column(idx+1, idx+1, max_len) # set column width (=1 because index = 1)
#fix index width
max_len=pd.Series(X.index.values).astype(str).map(len).max()+1
worksheet.set_column(0, 0, max_len)
if sep_only:
print(f'{sheet_name} is written as seperate file')
else:
print(f'{sheet_name} is written as seperate file')
print(f'{sheet_name} is written as sheet')
return writer
call example:
writer = write_to_excel(writer, dataframe, "Statistical_Analysis")
I may be a bit late to the party but this code works when using 'openpyxl' as your engine, sometimes pip install xlsxwriter wont solve the issue. This code below works like a charm. Edit any part as you wish.
def text_length(text):
"""
Get the effective text length in characters, taking into account newlines
"""
if not text:
return 0
lines = text.split("\n")
return max(len(line) for line in lines)
def _to_str_for_length(v, decimals=3):
"""
Like str() but rounds decimals to predefined length
"""
if isinstance(v, float):
# Round to [decimal] places
return str(Decimal(v).quantize(Decimal('1.' + '0' * decimals)).normalize())
else:
return str(v)
def auto_adjust_xlsx_column_width(df, writer, sheet_name, margin=3, length_factor=1.0, decimals=3, index=False):
sheet = writer.sheets[sheet_name]
_to_str = functools.partial(_to_str_for_length, decimals=decimals)
# Compute & set column width for each column
for column_name in df.columns:
# Convert the value of the columns to string and select the
column_length = max(df[column_name].apply(_to_str).map(text_length).max(), text_length(column_name)) + 5
# Get index of column in XLSX
# Column index is +1 if we also export the index column
col_idx = df.columns.get_loc(column_name)
if index:
col_idx += 1
# Set width of column to (column_length + margin)
sheet.column_dimensions[openpyxl.utils.cell.get_column_letter(col_idx + 1)].width = column_length * length_factor + margin
# Compute column width of index column (if enabled)
if index: # If the index column is being exported
index_length = max(df.index.map(_to_str).map(text_length).max(), text_length(df.index.name))
sheet.column_dimensions["A"].width = index_length * length_factor + margin
An openpyxl version based on #alichaudry's code.
The code 1) loads an excel file, 2) adjusts column widths and 3) saves it.
def auto_adjust_column_widths(excel_file : "Excel File Path", extra_space = 1) -> None:
"""
Adjusts column widths of the excel file and replaces it with the adjusted one.
Adjusting columns is based on the lengths of columns values (including column names).
Parameters
----------
excel_file :
excel_file to adjust column widths.
extra_space :
extra column width in addition to the value-based-widths
"""
from openpyxl import load_workbook
from openpyxl.utils import get_column_letter
wb = load_workbook(excel_file)
for ws in wb:
df = pd.DataFrame(ws.values,)
for i,r in (df.astype(str).applymap(len).max(axis=0) + extra_space).iteritems():
ws.column_dimensions[get_column_letter(i+1)].width = r
wb.save(excel_file)

Change dates format in one column (replace, insert column, append), any way to update? ..it must be simple

Need to change dates format in Excel column.
I can get into single cell but in case to update whole column with "proper_date" I am stuck
wb = load_workbook(...)
ws = wb['Lista']
daty_wystawienia = ws['G']
# This solution works but assigning values to first column under the chart
for daty in daty_wystawienia:
date_string = daty.value
if re.search('[0-9-]', str(date_string)):
proper_date = datetime.datetime.strptime(date_string, '%d-%m-%Y').strftime('%y.%m.%d')
for row in range(1):
ws.append([proper_date])
#tried to make last line: daty_wystawienia.append([proper_date]) but got:
AttributeError: 'tuple' object has no attribute 'append'
wb.save(...)
# Also tried this, and only this seems to work. Meaning replacing values with other correctly formatted, but I need this applied to whole column at once:
wb = load_workbook(...)
ws = wb['Lista polis']
daty_wystawienia = ws['G']
ws['G6'] = "19.05.06"
ws['G7'] = "19.05.06"
ws['G8'] = "19.05.06"
ws['G10'] = "19.05.07"
ws['G11'] = "19.05.07"
# or replace
for i in ws['G']:
ws['G9'] = ws['G9'].value.replace('06-05-2019', '10000000000')
wb.save(...)
Is there any way to replace, append, override existing values in excel using openpyxl. I am stuck on this.
Thanks in advance.
If you just want Excel to change the format of the Cell in order to display the date as you like, this is how I did it for a column:
from openpyxl import load_workbook
book = load_workbook('Example.xlsx')
ws = book['Sheet1']
for x in range (1, 500):
_cell = ws.cell(x,1)
_cell.number_format = '[$-en-GB]dd-mmm-yyyy'
book.save("Dates.xlsx")
Thanks for your effor. It looks beautiful but for some reason it does not work for me.
I went through it like this:
def date_of_issuance():
for i in ws.iter_rows():
for cell in i:
d_w = 'Date of issuance'
if cell.value == d_w:
c = cell.column
col = column_index_from_string(c)
r = cell.row
for daty in ws[c]:
date_string = daty.value
if re.search('[0-9]', str(date_string)):
proper_date = datetime.datetime.strptime(date_string, '%d-%m-%Y').strftime('%y-%m-%d')
date = datetime.datetime.strptime(proper_date, '%y-%m-%d').date()
for j in range(1):
ws.cell(row=r+1, column=col, value=date)
r += 1

Horizontal text alignment in openpyxl

I'm trying to change the text alignment to the center of 2 merged cells. I've found some answers that didn't work for my case:
currentCell = ws.cell('A1')
currentCell.style.alignment.horizontal = 'center' #TypeError: cannot set horizontal attribute
#or
currentCell.style.alignment.vertical = Alignment.HORIZONTAL_CENTER #AttributeError: type object 'Alignment' has no attribute 'HORIZONTAL_CENTER'
both didn't work, is there any other way to do it?
yes, there is a way to do this with openpyxl:
from openpyxl.styles import Alignment
currentCell = ws.cell('A1') #or currentCell = ws['A1']
currentCell.alignment = Alignment(horizontal='center')
hope this will help you
This is what finally worked for me with the latest version from PIP (2.2.5)
# center all cells
for col in w_sheet.columns:
for cell in col:
# openpyxl styles aren't mutable,
# so you have to create a copy of the style, modify the copy, then set it back
alignment_obj = cell.alignment.copy(horizontal='center', vertical='center')
cell.alignment = alignment_obj
Update:
As of openpyxl version 2.4.0 (~2016) the .copy() method is deprecated for StyleProxy objects.
Try changing the last two lines to:
from copy import copy
alignment_obj = copy(cell.alignment)
alignment_obj.horizontal = 'center'
alignment_obj.vertical = 'center'
cell.alignment = alignment_obj
None of the other solutions worked for me, since my solution requires openpyxl, and at least in 2.1.5 cell.alignment can't be set directly.
from openpyxl.styles import Style, Alignment
cell = ws.cell('A1')
cell.style = cell.style.copy(alignment=Alignment(horizontal='center'))
The above copies the current style and replaces the alignment.
You can also create a whole new style - with any values not specified taking the default values from https://openpyxl.readthedocs.org/en/latest/styles.html
cell.style = Style(alignment=Alignment(horizontal='center'),font=Font(bold=True))
# or - a tidier way
vals = {'alignment':Alignment(horizontal='center'),
'font':Font(bold=True),
}
new_style = Style(**vals)
cell.style = new_style
its my first time posting anything here.
So i found a way to align text using openpyxl, Alignment
i=3
while i < 253:
cellc = ws.cell(row=i, column= 3)
cellc.alignment = Alignment(horizontal="right")
i+=1
I set i to be the start point then the len of my column
I found the following code to be a pretty simple way to format every cell in your worksheet:
tot_rows = ws.max_row #get max row number
tot_cols = ws.max_column #get max column number
cols = range(1,tot_cols) #turns previous variables into iterables
rows = range(1,tot_rows)
for c in cols:
for r in rows:
ws.cell(row=r, column=c).alignment = Alignment(horizontal='center', vertical='center')
You can achieve this by using Python XlsxWriter library.
import xlsxwriter
workbook = xlsxwriter.Workbook('example.xlsx')
worksheet = workbook.add_worksheet()
cell_format = workbook.add_format({'align': 'center'})
worksheet.merge_range('A1:B1', "")
worksheet.write_rich_string('A1','Example', cell_format)
workbook.close()
Here i have merged cells A1, B1 and added a cell format parameter which includes the align parameter assigned as center.

Categories