How to apply a defined function of format to a dataframe

How to apply a defined function of format to a dataframe - python

I have multiple data frames (df, df1, df2,...) I want to apply my defined format and then export them to Excel (Excel-file, Excel-file1, Excel-file2,...)
I think of creating a defined function of formatting and applying it to my data frames but I do not know how to do about this.
# Create a Pandas Excel writer using XlsxWriter
writer = pd.ExcelWriter(r'N:\Excel-file.xlsx', engine='xlsxwriter')
# Skip one row to insert a defined header, turn off the default header, and remove index
df.to_excel(writer, sheet_name='Sheet1', startrow=1, header=False, index=False)
# The xlsxwriter workbook and worksheet objects
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# The default format of the workbook
workbook.formats[0].set_font_size(12)
workbook.formats[0].set_align('right')
# Header format
header_format = workbook.add_format({
'bold': True,
'border': 0})
header_format.set_font_size(14)
header_format.set_align('center')
# Write the column headers with the defined format
for col_num, value in enumerate(df.columns.values):
worksheet.write(0, col_num, value, header_format)
# Export the Excel file
writer.close()
The defined function looks like below
def format(df, file_name):
with pd.ExcelWriter(r'file_name.xlsx', engine='xlsxwriter') as writer:
df.to_excel(writer, sheet_name='Sheet1', startrow=1, header=False, index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
workbook.formats[0].set_font_size(12)
workbook.formats[0].set_align('right')
header_format = workbook.add_format({
'bold': True,
'border': 0})
header_format.set_font_size(14)
header_format.set_align('center')
for col_num, value in enumerate(df.columns.values):
worksheet.write(0, col_num, value, header_format)
df_list = [df1, df2, df3, df4, df5, df6]
for i, df in enumerate(df_list):
format(df, f"Excel-file{i+1}.xlsx")
The error is the format function only is applied to the df6 and I got 1 Excel file named "file_name". Any way to fix this issue?

pandas.DataFrame.apply() iterates the dataframe, on which it is called, by rows (or by columns) and applies provided transformation and returns one result per row (or column). The transformation logic should consider this fact and should process it as if it is individual row (or column). Per your source code above, you seem to be applying the logic to entire dataframe (df1) on which apply() is called.
I assume your problem statement is that you have multiple data frames (df, df1, df2,...) and you want to export them to individual Excel files by applying some common transformation logic.
You can collect them into a list and process them individually in a loop. Since format() does not return any results (i.e. not transforms the list element), the old fashioned iteration using for loop should be the way to go about it. Also, consider using with syntax for auto file resource handling to avoid memory leak or orphan file handlers.
def format(df, file_name):
with pd.ExcelWriter(r'file_name.xlsx', engine='xlsxwriter') as writer:
df.to_excel(writer, sheet_name='Sheet1', startrow=1, header=False, index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
workbook.formats[0].set_font_size(12)
workbook.formats[0].set_align('right')
header_format = workbook.add_format({
'bold': True,
'border': 0})
header_format.set_font_size(14)
header_format.set_align('center')
for col_num, value in enumerate(df.columns.values):
worksheet.write(0, col_num, value, header_format)
df_list = [df1, df2, df3, ...]
for i, df in enumerate(df_list):
format(df, f"Excel-file{i+1}.xlsx")

Related

Pandas dataframe. Add an aditional row header merging all columns

I want to add a "second" header to my excel using pandas dataframe.
The excel has his values and header. But I want to add a new row above the header with just one column (the size of all columns header). And text centered.
Something like this:
How can I do this?

Use MultiIndex.from_product, but text is not centered:
df.columns = pd.MultiIndex.from_product([['Result'], df.columns])
EDIT:
import string
# Creating a DataFrame
df = pd.DataFrame(np.random.randn(8, 6), columns=list('ABCDEF'))
# Create a Pandas Excel writer using XlsxWriter engine.
writer = pd.ExcelWriter("test.xlsx", engine='xlsxwriter')
# Create custom style
df.to_excel(writer, sheet_name='Sheet1', startrow=1, index=False)
# Get workbook and worksheet objects
workbook = writer.book
worksheet = writer.sheets['Sheet1']
merge_format = workbook.add_format({'align': 'center'})
len_cols = len(df.columns)
#set merge_range by length of colums names
len_cols = len(df.columns)
worksheet.merge_range(0, 0, 0, len_cols - 1, 'Result', merge_format)
writer.save()

Is there a way to set pandas to default YYY-MM-DD instead of YYY-MM-DD 00:00:00?

Can I set pandas to default YYYY-MM-DD, am getting YYYY-MM-DD 00:00:00 at the end? Is there a way to make sure by default that the zeros don't appear when I export to excel/csv?
Updated per comment request:
I have a function that looks like this:
x1 = my_funct('Unemployment', '2004-01-04 2009-01-04', 'DK', 'Unemployment (Denmark)')
Then I create a df out of it:
df1 = pd.DataFrame(x1)
along with others:
# this concats the df horizontally
df_merged1 = pd.concat([df1, df0, df2, df0, df3, df0, df4], axis=1)
df_merged1.reset_index(inplace=True)
Then I export that to excel:
writer = pd.ExcelWriter('Test1.xlsx', engine='xlsxwriter')
df_merged1.to_excel(writer, sheet_name='Sheet1', startrow=1, header=False, index=False)
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Add a header format.
header_format = workbook.add_format({
'bold': True,
'text_wrap': True,
'valign': 'top',
'fg_color': '#D7E4BC',
'border': 1})
# Write the column headers with the defined format.
for col_num, value in enumerate(df_merged1.columns.values):
worksheet.write(0, col_num, value, header_format)
format = workbook.add_format()
format.set_align('center')
format.set_align('vcenter')
worksheet.set_column(0, 11, 30, format)
writer.save()
writer.close()
The exported excel file has multiple date columns each one showing the extra 00:00:00 at the end. Is it possible to have it only as YYYY-MM-DD?
Thanks

The solution is to create a writer
Creating a tiny dataframe for test
import pandas as pd
from datetime import datetime
df = pd.DataFrame([datetime(2021, 3, 4, 20, 48, 5)])
This is the dataframe so far:
0
0 2021-03-04 20:48:05
Creating the Writer
writer = pd.ExcelWriter("exemple.xlsx", datetime_format='hh:mm:ss')
df.to_excel(writer, "Sheet1")
writer.close()
Note: I used hh:mm:ss but it could be any format.
If you need more details, see at ExcelWriter
The result would be:

How to apply Format object to index values in pandas

Question: I have a data frame with column names with respective values. But when i apply format object to column headings, they are not responding.
Code:
import pandas as pd
root = "C:\Users\543904\Desktop\New folder\"
dict = {'name':["aparna", "pankaj"],
'degree': ["MBA", "BCA"],
'score':[90, 40]}
df = pd.DataFrame(dict)
writer = pd.ExcelWriter(root + 'output', engine = "xlsxwriter")
df.to_excel(writer, sheet_name='df', index = False)
workbook = writer.book
worksheet = writer.sheets['df']
Format_Object = workbook.add_format({'text_wrap': True})
Format_Object.set_bold()
Format_Object.set_align('center')
Format_Object.set_align('top')
Format_Object.set_border(1)
Format_Object.set_bg_color('#0ef0ce')
worksheet.set_row(0, 20, Format_Object)
writer.save()
expected:
Expected
Actual:
Actual

This is explained in the XlsxWriter docs on Working with Python Pandas and XlsxWriter:
Pandas writes the dataframe header with a default cell format. Since it is a cell format it cannot be overridden using set_row(). If you wish to use your own format for the headings then the best approach is to turn off the automatic header from Pandas and write your own. For example:
# Turn off the default header and skip one row to allow us to insert a
# user defined header.
df.to_excel(writer, sheet_name='Sheet1', startrow=1, header=False)
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Add a header format.
header_format = workbook.add_format({
'bold': True,
'text_wrap': True,
'valign': 'top',
'fg_color': '#D7E4BC',
'border': 1})
# Write the column headers with the defined format.
for col_num, value in enumerate(df.columns.values):
worksheet.write(0, col_num + 1, value, header_format)

painting a cell in excel with condition using python

I am creating an excel report that should give me a result of automatic tests. It should say if they failed/ passed.
I have created the excel report from csv using this code:
import pandas as pd
import string
writer = pd.ExcelWriter("file.xlsx", engine="xlsxwriter")
df = pd.read_csv("K:\\results.csv")
df.to_excel(writer, sheet_name=os.path.basename("K:\\results.csv"))
# skip 2 rows
df.to_excel(writer, sheet_name='Sheet1', startrow=2, header=False, index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Add a header format.
header_format = workbook.add_format({
'bold': True,
'fg_color': '#ffcccc',
'border': 1})
# create dictionary for map length of columns
d = dict(zip(range(25), string.ascii_uppercase))
print (d)
max_len = d[len(df.columns) - 1]
print(max_len)
# C
# dynamically set merged columns in first row
worksheet.merge_range('A1:' + max_len + '1', 'This Sheet is for Personal Details')
for col_num, value in enumerate(df.columns.values):
# write to second row
worksheet.write(1, col_num, value, header_format)
column_len = df[value].astype(str).str.len().max()
column_len = max(column_len, len(value)) + 3
worksheet.set_column(col_num, col_num, column_len)
writer.save()
Now, if i have a cell that has the word" success" in it, i want to color it green, and if i have a cell in the excel which says "fail" in it i want to color it red. How can i access a specific cell in the excel file with the condition of whats written in it?
Thanks.

You could use a conditional format for this:
import pandas as pd
# Create a Pandas dataframe from some data.
df = pd.DataFrame({'Data': ['success', 'bar', 'fail', 'foo', 'success']})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_conditional.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Add a format for fail. Light red fill with dark red text.
fail_format = workbook.add_format({'bg_color': '#FFC7CE',
'font_color': '#9C0006'})
# Add a format for pass. Green fill with dark green text.
pass_format = workbook.add_format({'bg_color': '#C6EFCE',
'font_color': '#006100'})
# Apply conditional formats to the cell range.
worksheet.conditional_format('B2:B6', {'type': 'text',
'criteria': 'containing',
'value': 'fail',
'format': fail_format})
worksheet.conditional_format('B2:B6', {'type': 'text',
'criteria': 'containing',
'value': 'success',
'format': pass_format})
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Output:
See the XlsxWriter docs on Working with Conditional Formatting. Note, you can also use a numerical (row, col) range instead of the A1:D4 range, see the conditional_format().

Open existing workbook with ExcelWriter [duplicate]

I use pandas to write to excel file in the following fashion:
import pandas
writer = pandas.ExcelWriter('Masterfile.xlsx')
data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])
writer.save()
Masterfile.xlsx already consists of number of different tabs. However, it does not yet contain "Main".
Pandas correctly writes to "Main" sheet, unfortunately it also deletes all other tabs.

Pandas docs says it uses openpyxl for xlsx files. Quick look through the code in ExcelWriter gives a clue that something like this might work out:
import pandas
from openpyxl import load_workbook
book = load_workbook('Masterfile.xlsx')
writer = pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl')
writer.book = book
## ExcelWriter for some reason uses writer.sheets to access the sheet.
## If you leave it empty it will not know that sheet Main is already there
## and will create a new sheet.
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])
writer.save()

UPDATE: Starting from Pandas 1.3.0 the following function will not work properly, because functions DataFrame.to_excel() and pd.ExcelWriter() have been changed - a new if_sheet_exists parameter has been introduced, which has invalidated the function below.
Here you can find an updated version of the append_df_to_excel(), which is working for Pandas 1.3.0+.
Here is a helper function:
import os
from openpyxl import load_workbook
def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
truncate_sheet=False,
**to_excel_kwargs):
"""
Append a DataFrame [df] to existing Excel file [filename]
into [sheet_name] Sheet.
If [filename] doesn't exist, then this function will create it.
#param filename: File path or existing ExcelWriter
(Example: '/path/to/file.xlsx')
#param df: DataFrame to save to workbook
#param sheet_name: Name of sheet which will contain DataFrame.
(default: 'Sheet1')
#param startrow: upper left cell row to dump data frame.
Per default (startrow=None) calculate the last row
in the existing DF and write to the next row...
#param truncate_sheet: truncate (remove and recreate) [sheet_name]
before writing DataFrame to Excel file
#param to_excel_kwargs: arguments which will be passed to `DataFrame.to_excel()`
[can be a dictionary]
#return: None
Usage examples:
>>> append_df_to_excel('d:/temp/test.xlsx', df)
>>> append_df_to_excel('d:/temp/test.xlsx', df, header=None, index=False)
>>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',
index=False)
>>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',
index=False, startrow=25)
(c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
"""
# Excel file doesn't exist - saving and exiting
if not os.path.isfile(filename):
df.to_excel(
filename,
sheet_name=sheet_name,
startrow=startrow if startrow is not None else 0,
**to_excel_kwargs)
return
# ignore [engine] parameter if it was passed
if 'engine' in to_excel_kwargs:
to_excel_kwargs.pop('engine')
writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')
# try to open an existing workbook
writer.book = load_workbook(filename)
# get the last row in the existing Excel sheet
# if it was not specified explicitly
if startrow is None and sheet_name in writer.book.sheetnames:
startrow = writer.book[sheet_name].max_row
# truncate sheet
if truncate_sheet and sheet_name in writer.book.sheetnames:
# index of [sheet_name] sheet
idx = writer.book.sheetnames.index(sheet_name)
# remove [sheet_name]
writer.book.remove(writer.book.worksheets[idx])
# create an empty sheet [sheet_name] using old index
writer.book.create_sheet(sheet_name, idx)
# copy existing sheets
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
if startrow is None:
startrow = 0
# write out the new sheet
df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)
# save the workbook
writer.save()
Tested with the following versions:
Pandas 1.2.3
Openpyxl 3.0.5

With openpyxlversion 2.4.0 and pandasversion 0.19.2, the process #ski came up with gets a bit simpler:
import pandas
from openpyxl import load_workbook
with pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') as writer:
writer.book = load_workbook('Masterfile.xlsx')
data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])
#That's it!

Starting in pandas 0.24 you can simplify this with the mode keyword argument of ExcelWriter:
import pandas as pd
with pd.ExcelWriter('the_file.xlsx', engine='openpyxl', mode='a') as writer:
data_filtered.to_excel(writer)

I know this is an older thread, but this is the first item you find when searching, and the above solutions don't work if you need to retain charts in a workbook that you already have created. In that case, xlwings is a better option - it allows you to write to the excel book and keeps the charts/chart data.
simple example:
import xlwings as xw
import pandas as pd
#create DF
months = ['2017-01','2017-02','2017-03','2017-04','2017-05','2017-06','2017-07','2017-08','2017-09','2017-10','2017-11','2017-12']
value1 = [x * 5+5 for x in range(len(months))]
df = pd.DataFrame(value1, index = months, columns = ['value1'])
df['value2'] = df['value1']+5
df['value3'] = df['value2']+5
#load workbook that has a chart in it
wb = xw.Book('C:\\data\\bookwithChart.xlsx')
ws = wb.sheets['chartData']
ws.range('A1').options(index=False).value = df
wb = xw.Book('C:\\data\\bookwithChart_updated.xlsx')
xw.apps[0].quit()

Old question, but I am guessing some people still search for this - so...
I find this method nice because all worksheets are loaded into a dictionary of sheet name and dataframe pairs, created by pandas with the sheetname=None option. It is simple to add, delete or modify worksheets between reading the spreadsheet into the dict format and writing it back from the dict. For me the xlsxwriter works better than openpyxl for this particular task in terms of speed and format.
Note: future versions of pandas (0.21.0+) will change the "sheetname" parameter to "sheet_name".
# read a single or multi-sheet excel file
# (returns dict of sheetname(s), dataframe(s))
ws_dict = pd.read_excel(excel_file_path,
sheetname=None)
# all worksheets are accessible as dataframes.
# easy to change a worksheet as a dataframe:
mod_df = ws_dict['existing_worksheet']
# do work on mod_df...then reassign
ws_dict['existing_worksheet'] = mod_df
# add a dataframe to the workbook as a new worksheet with
# ws name, df as dict key, value:
ws_dict['new_worksheet'] = some_other_dataframe
# when done, write dictionary back to excel...
# xlsxwriter honors datetime and date formats
# (only included as example)...
with pd.ExcelWriter(excel_file_path,
engine='xlsxwriter',
datetime_format='yyyy-mm-dd',
date_format='yyyy-mm-dd') as writer:
for ws_name, df_sheet in ws_dict.items():
df_sheet.to_excel(writer, sheet_name=ws_name)
For the example in the 2013 question:
ws_dict = pd.read_excel('Masterfile.xlsx',
sheetname=None)
ws_dict['Main'] = data_filtered[['Diff1', 'Diff2']]
with pd.ExcelWriter('Masterfile.xlsx',
engine='xlsxwriter') as writer:
for ws_name, df_sheet in ws_dict.items():
df_sheet.to_excel(writer, sheet_name=ws_name)

There is a better solution in pandas 0.24:
with pd.ExcelWriter(path, mode='a') as writer:
s.to_excel(writer, sheet_name='another sheet', index=False)
before:
after:
so upgrade your pandas now:
pip install --upgrade pandas

The solution of #MaxU is not working for the updated version of python and related packages. It raises the error:
"zipfile.BadZipFile: File is not a zip file"
I generated a new version of the function that works fine with the updated version of python and related packages and tested with python: 3.9 | openpyxl: 3.0.6 | pandas: 1.2.3
In addition I added more features to the helper function:
Now It resize all columns based on cell content width AND all variables will be visible (SEE "resizeColumns")
You can handle NaN, if you want that NaN are displayed as NaN or as empty cells (SEE "na_rep")
Added "startcol", you can decide to start to write from specific column, oterwise will start from col = 0
Here the function:
import pandas as pd
def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None, startcol=None,
truncate_sheet=False, resizeColumns=True, na_rep = 'NA', **to_excel_kwargs):
"""
Append a DataFrame [df] to existing Excel file [filename]
into [sheet_name] Sheet.
If [filename] doesn't exist, then this function will create it.
Parameters:
filename : File path or existing ExcelWriter
(Example: '/path/to/file.xlsx')
df : dataframe to save to workbook
sheet_name : Name of sheet which will contain DataFrame.
(default: 'Sheet1')
startrow : upper left cell row to dump data frame.
Per default (startrow=None) calculate the last row
in the existing DF and write to the next row...
truncate_sheet : truncate (remove and recreate) [sheet_name]
before writing DataFrame to Excel file
resizeColumns: default = True . It resize all columns based on cell content width
to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
[can be dictionary]
na_rep: default = 'NA'. If, instead of NaN, you want blank cells, just edit as follows: na_rep=''
Returns: None
*******************
CONTRIBUTION:
Current helper function generated by [Baggio]: https://stackoverflow.com/users/14302009/baggio?tab=profile
Contributions to the current helper function: https://stackoverflow.com/users/4046632/buran?tab=profile
Original helper function: (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
Features of the new helper function:
1) Now it works with python 3.9 and latest versions of pandas and openpxl
---> Fixed the error: "zipfile.BadZipFile: File is not a zip file".
2) Now It resize all columns based on cell content width AND all variables will be visible (SEE "resizeColumns")
3) You can handle NaN, if you want that NaN are displayed as NaN or as empty cells (SEE "na_rep")
4) Added "startcol", you can decide to start to write from specific column, oterwise will start from col = 0
*******************
"""
from openpyxl import load_workbook
from string import ascii_uppercase
from openpyxl.utils import get_column_letter
from openpyxl import Workbook
# ignore [engine] parameter if it was passed
if 'engine' in to_excel_kwargs:
to_excel_kwargs.pop('engine')
try:
f = open(filename)
# Do something with the file
except IOError:
# print("File not accessible")
wb = Workbook()
ws = wb.active
ws.title = sheet_name
wb.save(filename)
writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')
# Python 2.x: define [FileNotFoundError] exception if it doesn't exist
try:
FileNotFoundError
except NameError:
FileNotFoundError = IOError
try:
# try to open an existing workbook
writer.book = load_workbook(filename)
# get the last row in the existing Excel sheet
# if it was not specified explicitly
if startrow is None and sheet_name in writer.book.sheetnames:
startrow = writer.book[sheet_name].max_row
# truncate sheet
if truncate_sheet and sheet_name in writer.book.sheetnames:
# index of [sheet_name] sheet
idx = writer.book.sheetnames.index(sheet_name)
# remove [sheet_name]
writer.book.remove(writer.book.worksheets[idx])
# create an empty sheet [sheet_name] using old index
writer.book.create_sheet(sheet_name, idx)
# copy existing sheets
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
except FileNotFoundError:
# file does not exist yet, we will create it
pass
if startrow is None:
# startrow = -1
startrow = 0
if startcol is None:
startcol = 0
# write out the new sheet
df.to_excel(writer, sheet_name, startrow=startrow, startcol=startcol, na_rep=na_rep, **to_excel_kwargs)
if resizeColumns:
ws = writer.book[sheet_name]
def auto_format_cell_width(ws):
for letter in range(1,ws.max_column):
maximum_value = 0
for cell in ws[get_column_letter(letter)]:
val_to_check = len(str(cell.value))
if val_to_check > maximum_value:
maximum_value = val_to_check
ws.column_dimensions[get_column_letter(letter)].width = maximum_value + 2
auto_format_cell_width(ws)
# save the workbook
writer.save()
Example Usage:
# Create a sample dataframe
df = pd.DataFrame({'numbers': [1, 2, 3],
'colors': ['red', 'white', 'blue'],
'colorsTwo': ['yellow', 'white', 'blue'],
'NaNcheck': [float('NaN'), 1, float('NaN')],
})
# EDIT YOUR PATH FOR THE EXPORT
filename = r"C:\DataScience\df.xlsx"
# RUN ONE BY ONE IN ROW THE FOLLOWING LINES, TO SEE THE DIFFERENT UPDATES TO THE EXCELFILE
append_df_to_excel(filename, df, index=False, startrow=0) # Basic Export of df in default sheet (Sheet1)
append_df_to_excel(filename, df, sheet_name="Cool", index=False, startrow=0) # Append the sheet "Cool" where "df" is written
append_df_to_excel(filename, df, sheet_name="Cool", index=False) # Append another "df" to the sheet "Cool", just below the other "df" instance
append_df_to_excel(filename, df, sheet_name="Cool", index=False, startrow=0, startcol=5) # Append another "df" to the sheet "Cool" starting from col 5
append_df_to_excel(filename, df, index=False, truncate_sheet=True, startrow=10, na_rep = '') # Override (truncate) the "Sheet1", writing the df from row 10, and showing blank cells instead of NaN

def append_sheet_to_master(self, master_file_path, current_file_path, sheet_name):
try:
master_book = load_workbook(master_file_path)
master_writer = pandas.ExcelWriter(master_file_path, engine='openpyxl')
master_writer.book = master_book
master_writer.sheets = dict((ws.title, ws) for ws in master_book.worksheets)
current_frames = pandas.ExcelFile(current_file_path).parse(pandas.ExcelFile(current_file_path).sheet_names[0],
header=None,
index_col=None)
current_frames.to_excel(master_writer, sheet_name, index=None, header=False)
master_writer.save()
except Exception as e:
raise e
This works perfectly fine only thing is that formatting of the master file(file to which we add new sheet) is lost.

writer = pd.ExcelWriter('prueba1.xlsx'engine='openpyxl',keep_date_col=True)
The "keep_date_col" hope help you

I used the answer described here
from openpyxl import load_workbook
writer = pd.ExcelWriter(p_file_name, engine='openpyxl', mode='a')
writer.book = load_workbook(p_file_name)
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
df.to_excel(writer, 'Data', startrow=10, startcol=20)
writer.save()

book = load_workbook(xlsFilename)
writer = pd.ExcelWriter(self.xlsFilename)
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name=sheetName, index=False)
writer.save()

Solution by #MaxU worked very well. I have just one suggestion:
If truncate_sheet=True is specified than "startrow" should NOT be retained from existing sheet. I suggest:
if startrow is None and sheet_name in writer.book.sheetnames:
if not truncate_sheet: # truncate_sheet would use startrow if provided (or zero below)
startrow = writer.book[sheet_name].max_row

I'd reccommend using xlwings (https://docs.xlwings.org/en/stable/api.html), it is really powerful for this application... This is how I use it:
import xlwings as xw
import pandas as pd
import xlsxwriter
# function to get the active workbook
def getActiveWorkbook():
try:
# logic from xlwings to grab the current excel file
activeWb = xw.books.active
except:
# print error message if unable to get the current workbook
print('Unable to grab the current Workbook')
pause()
exitProgram()
else:
return activeWb
# function that returns the last row number and last cell of a sheet
def getLastRow(myBook, sheetName):
lastRow = myBook.sheets[sheetName].range("A1").current_region.last_cell.row
lastCol = str(xlsxwriter.utility.xl_col_to_name(myBook.sheets[sheetName].range("A1").current_region.last_cell.column))
return str(lastRow), lastCol + str(lastRow)
activeWb = getActiveWorkbook()
df = pd.DataFrame(data=[1,2,3])
# look at worksheet = Part Number Status
sheetName = "Sheet1"
ws = activeWb.sheets[sheetName]
lastRow, lastCell = getLastRow(activeWb, sheetName)
if int(lastRow) > 1:
ws.range("A1:" + lastCell).clear()
ws.range("A1").options(index=False, header=False).value = df.fillna('')
This seems to work very well for my applications because .xlsm workbooks can be very tricky. You can execute this as a python script or turn it into and executable with pyinstaller and then run the .exe through an excel macro. You can also call VBA macros from Python using xlwings which is very useful.

You can write to an existing Excel file without overwriting data using pandas by using the pandas.DataFrame.to_excel() method and specifying the mode parameter as 'a' (append mode).
Here's an example:
import pandas as pd
# Create a sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
# Write the DataFrame to an existing Excel file in append mode
df.to_excel('existing_file.xlsx', engine='openpyxl', mode='a', index=False, sheet_name='Sheet1')

Method:
Can create file if not present
Append to existing excel as per sheet name
import pandas as pd
from openpyxl import load_workbook
def write_to_excel(df, file):
try:
book = load_workbook(file)
writer = pd.ExcelWriter(file, engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, **kwds)
writer.save()
except FileNotFoundError as e:
df.to_excel(file, **kwds)
Usage:
df_a = pd.DataFrame(range(10), columns=["a"])
df_b = pd.DataFrame(range(10, 20), columns=["b"])
write_to_excel(df_a, "test.xlsx", sheet_name="Sheet a", columns=['a'], index=False)
write_to_excel(df_b, "test.xlsx", sheet_name="Sheet b", columns=['b'])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to apply a defined function of format to a dataframe - python

Related

Pandas dataframe. Add an aditional row header merging all columns

Is there a way to set pandas to default YYY-MM-DD instead of YYY-MM-DD 00:00:00?

How to apply Format object to index values in pandas

painting a cell in excel with condition using python

Open existing workbook with ExcelWriter [duplicate]

Categories

Resources