I have an Excel file (.xls format) with 5 sheets, I want to replace the contents of sheet 5 with contents of my pandas data frame.
From your above needs, you will need to use both Python (to export pandas data frame) and VBA (to delete existing worksheet content and copy/paste external data).
With Python: use the to_csv or to_excel methods. I recommend the to_csv method which performs better with larger datasets.
# DF TO EXCEL
from pandas import ExcelWriter
writer = ExcelWriter('PythonExport.xlsx')
yourdf.to_excel(writer,'Sheet5')
writer.save()
# DF TO CSV
yourdf.to_csv('PythonExport.csv', sep=',')
With VBA: copy and paste source to destination ranges.
Fortunately, in VBA you can call Python scripts using Shell (assuming your OS is Windows).
Sub DataFrameImport()
'RUN PYTHON TO EXPORT DATA FRAME
Shell "C:\pathTo\python.exe fullpathOfPythonScript.py", vbNormalFocus
'CLEAR EXISTING CONTENT
ThisWorkbook.Worksheets(5).Cells.Clear
'COPY AND PASTE TO WORKBOOK
Workbooks("PythonExport").Worksheets(1).Cells.Copy
ThisWorkbook.Worksheets(5).Range("A1").Select
ThisWorkbook.Worksheets(5).Paste
End Sub
Alternatively, you can do vice versa: run a macro (ClearExistingContent) with Python. Be sure your Excel file is a macro-enabled (.xlsm) one with a saved macro to delete Sheet 5 content only. Note: macros cannot be saved with csv files.
import os
import win32com.client
from pandas import ExcelWriter
if os.path.exists("C:\Full Location\To\excelsheet.xlsm"):
xlApp=win32com.client.Dispatch("Excel.Application")
wb = xlApp.Workbooks.Open(Filename="C:\Full Location\To\excelsheet.xlsm")
# MACRO TO CLEAR SHEET 5 CONTENT
xlApp.Run("ClearExistingContent")
wb.Save()
xlApp.Quit()
del xl
# WRITE IN DATA FRAME TO SHEET 5
writer = ExcelWriter('C:\Full Location\To\excelsheet.xlsm')
yourdf.to_excel(writer,'Sheet5')
writer.save()
Or you can do like this:
your_df.to_excel( r'C:\Users\full_path\excel_name.xlsx',
sheet_name= 'your_sheet_name'
)
I tested the previous answers found here: Assuming that we want the other four sheets to remain, the previous answers here did not work, because the other four sheets were deleted. In case we want them to remain use xlwings:
import xlwings as xw
import pandas as pd
filename = "test.xlsx"
df = pd.DataFrame([
("a", 1, 8, 3),
("b", 1, 2, 5),
("c", 3, 4, 6),
], columns=['one', 'two', 'three', "four"])
app = xw.App(visible=False)
wb = xw.Book(filename)
ws = wb.sheets["Sheet5"]
ws.clear()
ws["A1"].options(pd.DataFrame, header=1, index=False, expand='table').value = df
# If formatting of column names and index is needed as xlsxwriter does it,
# the following lines will do it (if the dataframe is not multiindex).
ws["A1"].expand("right").api.Font.Bold = True
ws["A1"].expand("down").api.Font.Bold = True
ws["A1"].expand("right").api.Borders.Weight = 2
ws["A1"].expand("down").api.Borders.Weight = 2
wb.save(filename)
app.quit()
Related
I can't find an answer (or one I know how to implement) when it comes to using the excel "hyperlink" style for a column when exporting using pd.to_excel.
I can find plenty of (OLD) answers on using xlsxwriter or openpyxl. But none using the current pandas functionality.
I think it might be possible now with the updates to the .style function? But I don't know how to implement the CSS2.2 rules to emulate the hyperlink style.
import pandas as pd
df = pd.DataFrame({'ID':1, 'link':['=HYPERLINK("http://www.someurl.com", "some website")']})
df.to_excel('test.xlsx')
The desired output is for the link column, to be the standard blue underlined text that then turns purple once you have clicked the link.
Is there a way to use the built in excel styling? Or would you have to pass various css properties througha dictionary using .style?
Here is one way to do it using xlsxwriter as the Excel engine:
import pandas as pd
df = pd.DataFrame({'ID': [1, 2],
'link':['=HYPERLINK("http://www.python.org", "some website")',
'=HYPERLINK("http://www.python.org", "some website")']})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter objects from the dataframe writer object.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Get the default URL format.
url_format = workbook.get_default_url_format()
# Apply it to the appropriate column, and widen the column.
worksheet.set_column(2, 2, 40, url_format)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Output, note that the second link has been clicked and is a different color:
Note, it would be preferable to use the xlsxwriter worksheet.write_url() method since that will look like a native Excel url to the end user and also doesn't need the above trick of getting and applying the url format. However, that method can't be used directly from a pandas dataframe (unlike the formula) so you would need to iterate through the link column of the dataframe and overwrite the formulas programatically with actual links.
Something like this:
import pandas as pd
df = pd.DataFrame({'ID': [1, 2],
'link':['=HYPERLINK("http://www.python.org", "some website")',
'=HYPERLINK("http://www.python.org", "some website")']})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('test2.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Get the worksheet handle.
worksheet = writer.sheets['Sheet1']
# Widen the colum for clarity
worksheet.set_column(2, 2, 40)
# Overwrite the urls
worksheet.write_url(1, 2, "http://www.python.org", None, "some website")
worksheet.write_url(2, 2, "http://www.python.org", None, "some website")
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Output:
I am having trouble updating an Excel Sheet using pandas by writing new values in it. I already have an existing frame df1 that reads the values from MySheet1.xlsx. so this needs to either be a new dataframe or somehow to copy and overwrite the existing one.
The spreadsheet is in this format:
I have a python list: values_list = [12.34, 17.56, 12.45]. My goal is to insert the list values under Col_C header vertically. It is currently overwriting the entire dataframe horizontally, without preserving the current values.
df2 = pd.DataFrame({'Col_C': values_list})
writer = pd.ExcelWriter('excelfile.xlsx', engine='xlsxwriter')
df2.to_excel(writer, sheet_name='MySheet1')
workbook = writer.book
worksheet = writer.sheets['MySheet1']
How to get this end result? Thank you!
Below I've provided a fully reproducible example of how you can go about modifying an existing .xlsx workbook using pandas and the openpyxl module (link to Openpyxl Docs).
First, for demonstration purposes, I create a workbook called test.xlsx:
from openpyxl import load_workbook
import pandas as pd
writer = pd.ExcelWriter('test.xlsx', engine='openpyxl')
wb = writer.book
df = pd.DataFrame({'Col_A': [1,2,3,4],
'Col_B': [5,6,7,8],
'Col_C': [0,0,0,0],
'Col_D': [13,14,15,16]})
df.to_excel(writer, index=False)
wb.save('test.xlsx')
This is the Expected output at this point:
In this second part, we load the existing workbook ('test.xlsx') and modify the third column with different data.
from openpyxl import load_workbook
import pandas as pd
df_new = pd.DataFrame({'Col_C': [9, 10, 11, 12]})
wb = load_workbook('test.xlsx')
ws = wb['Sheet1']
for index, row in df_new.iterrows():
cell = 'C%d' % (index + 2)
ws[cell] = row[0]
wb.save('test.xlsx')
This is the Expected output at the end:
In my opinion, the easiest solution is to read the excel as a panda's dataframe, and modify it and write out as an excel. So for example:
Comments:
Import pandas as pd.
Read the excel sheet into pandas data-frame called.
Take your data, which could be in a list format, and assign it to the column you want. (just make sure the lengths are the same). Save your data-frame as an excel, either override the old excel or create a new one.
Code:
import pandas as pd
ExcelDataInPandasDataFrame = pd.read_excel("./YourExcel.xlsx")
YourDataInAList = [12.34,17.56,12.45]
ExcelDataInPandasDataFrame ["Col_C"] = YourDataInAList
ExcelDataInPandasDataFrame .to_excel("./YourNewExcel.xlsx",index=False)
I'm trying to copy/append a dataframe with multiple column headers(similar to the one below) to an existing excel sheet starting from a particular cell AA2
df1 = pd.DataFrame({'sub1': [np.nan,'E',np.nan,'S'],
'sub2': [np.nan,'D',np.nan,'A']})
df2 = pd.DataFrame({'sub1': [np.nan,'D',np.nan,'S'],
'sub2': [np.nan,'C',np.nan,'S']})
df = pd.concat({'Af':df1, 'Dp':df2}, axis=1)
df
I'm thinking of a solution to export this dataframe to an excel starting in that particular cell and use openpyxl to copy the data from one to another - column by column... but not sure if that is the correct approach. any ideas?!
(the excel sheet that I'm working with has formatting and can't make it into a dataframe and use merge)
I've had success manipulating Excel files in the past with xlsxwriter (you will need to pip install this as a dependency first - although it does not need to be explicitly imported).
import io
import pandas as pd
# Load your file here instead
file_bytes = io.BytesIO()
with pd.ExcelWriter(file_bytes, engine = 'xlsxwriter') as writer:
# Write a DataFrame to Excel into specific cells
pd.DataFrame().to_excel(
writer,
sheet_name = 'test_sheet',
startrow = 10, startcol = 5,
index = False
)
# Note: You can repeat any of these operations within the context manager
# and keep adding stuff...
# Add some text to cells as well:
writer.sheets['test_sheet'].write('A1', 'Your text goes here')
file_bytes.seek(0)
# Then write your bytes to a file...
# Overwriting it in your case?
Bonus:
You can add plots too - just write them to a BytesIO object and then call <your_image_bytes>.seek(0) and then use in insert_image() function.
... # still inside ExcelWriter context manager
plot_bytes = io.BytesIO()
# Create plot in matplotlib here
plt.savefig(plot_bytes, format='png') # Instead of plt.show()
plot_bytes.seek(0)
writer.sheets['test_sheet'].insert_image(
5, # Row start
5, # Col start
'some_image_name.png',
options = {'image_data': plot_bytes}
)
The full documentation is really helpful too:
https://xlsxwriter.readthedocs.io/working_with_pandas.html
Thank you for reading my post and I appreciate your help!
I am trying to complete below steps using Python:
copy 5 excel xlsx files from a folder as data source(all 5 files only have 1 sheet each)
paste the above mentioned excel files in one workbook as 5 separate worksheet.
Make updates on every sheet(for example, on sheet1 I need to sum a specific column etc.) and then write the modified sheets back to the same workbook.
Issue: when I write back to the original file it replaced the workbook in step2. I searched far and wide here and it says I need to change writer to openpyxl, however, when I change to openpyxl it has "zipfile.BadZipFile: File is not a zip file" error.
import openpyxl
import pandas as pd
import os
from openpyxl import Workbook
import xlsxwriter as xw
import openpyxl as xl
from openpyxl.utils import get_column_letter
import numpy as np
# step 1: copy sheets from data folder into master workbook
# opening the source excel file
df1 = pd.read_excel(r"file1path.xlsx")
df2 = pd.read_excel(r"file2path.xlsx")
df3 = pd.read_excel(r"file3path.xlsx")
df4 = pd.read_excel(r"file4path.xlsx")
df5 = pd.read_excel(r"file5path.xlsx")
dest_filename = r'masterfilepath.xlsx'
# opening the destination excel file
writer = pd.ExcelWriter(dest_filename, engine='xlsxwriter')
# Write to master workbook.
df1.to_excel(writer, sheet_name='W.5 Revenue',index= False)
df2.to_excel(writer, sheet_name='W.4 Rev details',index= False)
df3.to_excel(writer, sheet_name='W.7 Accrual',index= False)
df4.to_excel(writer, sheet_name='W.6 Adhoc',index= False)
df5.to_excel(writer, sheet_name='W.8 State',index= False)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
# Filter rows account number beginning with 4 on file1(df1)
df1 = df1[df1["Ledger account"].str.startswith('4')]
writer = pd.ExcelWriter(dest_filename, engine='openpyxl')
if os.path.exists(dest_filename):
book = openpyxl.load_workbook(dest_filename)
writer.book = book
df1.to_excel(writer, sheet_name="W5")
writer.save()
writer.close()
I'm currently doing an ETL process and am stuck with this data frame to excel issue. (Removed image tag due to lack of reputation)
I have an excel file template which looks like this https://i.imgur.com/VEHQHQF.png
Running a dataframe with values inside, I want to open up that template,(there will be future data in it) and dump the dataframe inside so ideally it'll look like this:
https://i.imgur.com/WPhLJV4.png
However, when i run my code, i kept getting this output: https://i.imgur.com/JhxkyWS.png
The table formatting stops at cell C2
is there anyway to push the data INTO the table formatting template(appending values)?
I have tried using openpyxl and pandas in-built excel code to_excel but it all did not work out. I kept getting the same error whereby the table format stops at the 2nd row.
I have also tried adding/removing the header in my data frame to match with the header in the excel file but there were no difference.
My current code:
import pandas as pd
import xlsxwriter
import openpyxl
import os,sys
from openpyxl import load_workbook
d = {'this': [1, 2, 3], 'is': [4, 5, 6], 'test': [7, 8, 9]}
df = pd.DataFrame(data = d)
file_descr = 'test.xlsx'
def write_data(self, file_descr):
"""
Use dataframe to_excel to write into file_descr (filename) - open first if file exists.
"""
if os.path.isfile(file_descr):
book = load_workbook(file_descr)
writer = pd.ExcelWriter(file_descr, engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name='Sheet1', index=False, header= True,
float_format='%.2f')
## Using xlswriter as an engine
# writer = pd.ExcelWriter(file_descr,engine = 'xlsxwriter')
# df.to_excel(writer,index=False,sheet_name = 'Sheet1')
# workbook = writer.book
writer.save()
else:
self.data_df.to_excel(file_descr, sheet_name='Sheet1', index=False,
float_format='%.2f')
write_data(df,file_descr)