I have a dataframe like as shown below
Date,cust,region,Abr,Number,
12/01/2010,Company_Name,Somecity,Chi,36,
12/02/2010,Company_Name,Someothercity,Nyc,156,
df = pd.read_clipboard(sep=',')
I would like to write this dataframe to a specific sheet (called temp_data) in the file output.xlsx
Therfore I tried the below
import pandas
from openpyxl import load_workbook
book = load_workbook('output.xlsx')
writer = pandas.ExcelWriter('output.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
I also tried the below
path = 'output.xlsx'
with pd.ExcelWriter(path) as writer:
writer.book = openpyxl.load_workbook(path)
final_df.to_excel(writer, sheet_name='temp_data',startrow=10)
writer.save()
But am not sure whether I am overcomplicating it. I get an error like as shown below. But I verifiedd in task manager, no excel file/task is running
BadZipFile: File is not a zip file
Moreover, I also lose my formatting of the output.xlsx file when I manage to write the file based on below suggestions. I already have a neatly formatted font,color file etc and just need to put the data inside.
Is there anyway to write the pandas dataframe to a specific sheet in an existing excel file? WITHOUT LOSING FORMATTING OF THE DESTIATION FILE
You need to just use to_excel from pandas dataframe.
Try below snippet:
df1.to_excel("output.xlsx",sheet_name='Sheet_name')
If there is existing data please try below snippet:
writer = pd.ExcelWriter('output.xlsx', engine='openpyxl')
# try to open an existing workbook
writer.book = load_workbook('output.xlsx')
df.to_excel(writer,index=False,header=False,startrow=len(reader)+1)
writer.save()
writer.close()
Are you restricted to using pandas or openpyxl?
Because if you're comfortable using other libraries, the easiest way is probably using win32com to puppet excel as if you were a user manually copying and pasting the information over.
import pandas as pd
import io
import win32com.client as win32
import os
csv_text = """Date,cust,region,Abr,Number
12/01/2010,Company_Name,Somecity,Chi,36
12/02/2010,Company_Name,Someothercity,Nyc,156"""
df = pd.read_csv(io.StringIO(csv_text),sep = ',')
temp_path = r"C:\Users\[User]\Desktop\temp.xlsx" #temporary location where to write this dataframe
df.to_excel(temp_path,index = False) #temporarily write this file to excel, change the output path as needed
excel = win32.Dispatch("Excel.Application")
excel.Visible = True #Switch these attributes to False if you'd prefer Excel to be invisible while excecuting this script
excel.ScreenUpdating = True
temp_wb = excel.Workbooks.Open(temp_path)
temp_ws = temp_wb.Sheets("Sheet1")
output_path = r"C:\Users\[User]\Desktop\output.xlsx" #Path to your output excel file
output_wb = excel.Workbooks.Open(output_path)
output_ws = output_wb.Sheets("Output_sheet")
temp_ws.Range('A1').CurrentRegion.Copy(Destination = output_ws.Range('A1')) # Feel free to modify the Cell where you'd like the data to be copied to
input('Check that output looks like you expected\n') # Added pause here to make sure script doesn't overwrite your file before you've looked at the output
temp_wb.Close()
output_wb.Close(True) #Close output workbook and save changes
excel.Quit() #Close excel
os.remove(temp_path) #Delete temporary excel file
Let me know if this achieves what you were after.
I spent all day on this (and a co-worker of mine spent even longer). Thankfully, it seems to work for my purposes - pasting a dataframe into an Excel sheet without changing any of the Excel source formatting. It requires the pywin32 package, which "drives" Excel as if it a user, using VBA.
import pandas as pd
from win32com import client
# Grab your source data any way you please - I'm defining it manually here:
df = pd.DataFrame([
['LOOK','','','','','','','',''],
['','MA!','','','','','','',''],
['','','I pasted','','','','','',''],
['','','','into','','','','',''],
['','','','','Excel','','','',''],
['','','','','','without','','',''],
['','','','','','','breaking','',''],
['','','','','','','','all the',''],
['','','','','','','','','FORMATTING!']
])
# Copy the df to clipboard, so we can later paste it as text.
df.to_clipboard(index=False, header=False)
excel_app = client.gencache.EnsureDispatch("Excel.Application") # Initialize instance
wb = excel_app.Workbooks.Open("Template.xlsx") # Load your (formatted) template workbook
ws = wb.Worksheets(1) # First worksheet becomes active - you could also refer to a sheet by name
ws.Range("A3").Select() # Only select a single cell using Excel nomenclature, otherwise this breaks
ws.PasteSpecial(Format='Unicode Text') # Paste as text
wb.SaveAs("Updated Template.xlsx") # Save our work
excel_app.Quit() # End the Excel instance
In general, when using the win32com approach, it's helpful to record yourself (with a macro) doing what you want to accomplish in Excel, then reading the generated macro code. Often this will give you excellent clues as to what commands you could invoke.
The solution to your problem exists here: How to save a new sheet in an existing excel file, using Pandas?
To add a new sheet from a df:
import pandas as pd
from openpyxl import load_workbook
import os
import numpy as np
os.chdir(r'C:\workdir')
path = 'output.xlsx'
book = load_workbook(path)
writer = pd.ExcelWriter(path, engine = 'openpyxl')
writer.book = book
### replace with your df ###
x = np.random.randn(100, 2)
df = pd.DataFrame(x)
df.to_excel(writer, sheet_name = 'x')
writer.save()
writer.close()
You can try xltpl.
Create a template file based on your output.xlsx file.
Render a file with your data.
from xltpl.writerx import BookWriterx
writer = BookWriterx('template.xlsx')
d = {'rows': df.values}
d['tpl_name'] = 'tpl_sheet'
d['sheet_name'] = 'temp_data'
writer.render_sheet(d)
d['tpl_name'] = 'other_sheet'
d['sheet_name'] = 'other'
writer.render_sheet(d)
writer.save('out.xls')
See examples.
I have an issue with the use of Pandas + ExcelWriter + load_workbook.
My need is to be able to modify data from an existing excel file (without deleting the rest).
It works partly, but when I check the size of the produced file and the original one the size is quite different.
Moreover, it seems to lack some properties. Which leads to an error message when I want to integrate the modified file into an application.
The code bellow :
data_filtered = pd.DataFrame([date, date, date, date], index=[2,3,4,5])
book = openpyxl.load_workbook(file_origin)
writer = pd.ExcelWriter(file_modif, engine='openpyxl',datetime_format='dd/mm/yyyy hh:mm:ss', date_format='dd/mm/yyyy')
writer.book = book
## ExcelWriter for some reason uses writer.sheets to access the sheet.
## If you leave it empty it will not know that sheet Main is already there
## and will create a new sheet.
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
data_filtered.to_excel(writer, sheet_name="PCA pour intégration", index=False, startrow=2, startcol=5, header=False, verbose=True)
writer.save()`
My code works exactly like I would like it to by taking the data from the df and inserting it into the desired Excel file while skipping the appropriate rows. However, when I hit the .save() function other sheets that reference the data (mostly through pivots) seem to break even though they were not touched by the writer. I can insert the data into another Excel file, copy, and paste the exact same data where the python data puts it and the corresponding sheets do not break, but display the correct information. How do you stop other sheets from breaking when Python write to the file?
filename_in = 'File Location In'
filename_out = 'File Location Out'
sheet_name = 'Detail'
pos_detail_data_df.to_excel(filename_in, sheet_name=sheet_name, header = False, index = False)
df = pd.read_excel(filename_in, sheet_name=sheet_name)
book = load_workbook(filename_in)
writer = pd.ExcelWriter(filename_out, engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
writer.sheets
df.to_excel(writer, sheet_name, index=False, startrow = 2, header = False)
writer.save()
Edit:
The code was updated to reflect the assistance from below. However, now the process will simply remove everything from my filename_out and replace it with only the sheets from filename_in
I found an Excel file with a slicer so I took a look.
Sample file:
Site: https://www.contextures.com/excelpivottableslicers.html#download
Try:
import pandas as pd
from openpyxl import load_workbook
# sample Excel file with slicers.
# if required download and unzip and put in the folder with this script
sample_file = 'https://www.contextures.com/pivotsamples/regionsalesslicer.zip'
# set your filename_in, filename_out, and sheet_name
filename_in = 'regionsalesslicer.xlsx'
filename_out = 'regionsalesslicerUpdated.xlsx'
sheet_name = 'Sales Data'
# read in the Excel file with pd.read_excel rather than pd.ExcelFile
# just to play safe and avoid any BadZipFile: File is not a zip file errors
df = pd.read_excel(filename_in, sheet_name=sheet_name)
################## WHATEVER YOU WANT BELOW UNTIL LINE 37 ##################
# check the contents
print(df.head(2), '\n')
# make a change (or changes) to your df
# in the case just swap 'Carrot' for 'Orange' in the 'Product' column
df.loc[df['Product'] == 'Carrot', 'Product'] = 'Orange'
# check the contents after the change
print(df.head(2), '\n')
# as long as you have imported from the top two lines and read the file
# and not called ExcelWriter before this point all the other lines above
# are up to you.
################## WHATEVER YOU NEED ABOVE AFTER LINE 15 ##################
# from this point on try...
book = load_workbook(filename_in)
writer = pd.ExcelWriter(filename_out, engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name, index=False)
writer.save()
In the resulting file (in the example code above we used filename_out = 'regionsalesslicerUpdated.xlsx'), the slicers still work.
Example:
Shows 'Orange'. Let's refresh the data...
Slicer/filter shows 'Orange'...
Exporting from pandas to Excel has not deleted any of the sheets etc...
We have successfully overwritten a dataframe to an existing sheet in Excel.
There is no way to do this if you are writing directly to the sheet unless you would like to pay for xlwings. A better (and easier to manage) solution is to change the way you are collecting your data from excel - Also, it won't break any dashboards or slicers you have. It will require some adjustments to your overall data pipeline and how you process it. Again, a one time thing that will pay dividends in the future.
Instead of writing directly to a sheet in the file, you can write to a separate file altogether.
df.to_excel(writer_path_to_seperate_sheet, sheet_name, index=False)
From excel you can now import this file (and every other file that you may write to the folder in the future) via power query.
Select either the file with your data, or preferably, the folder which will contain your file and all future files. Click combine and transform.
Once you complete this step, you can adjust your data set to your liking and load it. It will be a table by default (perfect for pivot tables and anything else). When new files are written to the folder, you simply click refresh on the table data set and wala. All slicers and other dashboard/pivots are left unaffected.
I am reading multiple xml files extracting some data then forming a pandas Dataframe with my data. These are the main steps that I do:
open an xml file
extract some elements
create a pandas dataframe with the extracted elements
append the results in the excel file named "output.xlsx"(using the code below in python)
My steps are repeated for all xml files that i have (15gb of initial data that usually have 100MB of valuable text data)
This is my python code for appending data frames in the output excel file:
book = load_workbook('output.xlsx')
writer = pd.ExcelWriter('output.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = {ws.title: ws for ws in book.worksheets}
startrow = writer.sheets['Sheet1'].max_row
output.to_excel(writer, startrow=startrow,index = False, header = False)
writer.save()
When I open the "output.xlsx" in Excel, I receive a prompt message saying "We found a problem with some content in "output.xlsx". Do you want us to try to recover as much as we can?" with a yes or no answer
This is the log file that excel generates:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<logFileName>error001280_01.xml</logFileName><summary>Errors were detected in
file 'D:\JUPYWORKDIR\2009Results\output.xlsx'</summary><repairedRecords>
<repairedRecord>Repaired Records: String properties from /xl/worksheets/sheet1.xml part
</repairedRecord></repairedRecords></recoveryLog>
I am worried that saving my results to excel format is corrupting my data, i will read "output.xlsx" with pandas in future in order to do some data analysis, does this problem effect my future analysis? I wanted to know why this problem is generated and should I save my data in CSV? any suggestions?
Ps. Checking the last row of "output.xlsx" using python code it is the same number of rows when i import the excel file in a pandas Dataframe, lastly checking the last row of the "recovered file" of Microsoft excel i still find the same number of rows so i think its a generic error of Microsoft excel because of large data but i am not sure
I also had the same issue and spent at least an hr searching to fix it, there was only 1 change and it got fixed, Instead of using writer.save(), try using writer.close() and it should resolve the issue.
Modified the above mentioned code:
options = {}
options['strings_to_formulas'] = False
options['strings_to_urls'] = False
book = load_workbook('output.xlsx')
writer = pd.ExcelWriter('output.xlsx', engine='openpyxl',options=options)
writer.book = book
writer.sheets = {ws.title: ws for ws in book.worksheets}
startrow = writer.sheets['Sheet1'].max_row
output.to_excel(writer, startrow=startrow,index = False, header = False)
writer.close()
I think its an excel problem handling big data because i opened the file with open office spreadsheets and it doesn't show any error
I have a workbook called TEMPLATE.xlsx. In this workbook i have two tabs, ALL_DATA_RAW and WEEKLY_DATA_RAW. get my data from an API and feed it into Weekly_Data tab by opening TEMPLATE workbook, deleting the WEEKLY_DATA_RAW, then recreating that same tab and storing the df from API into that tab.
book = openpyxl.load_workbook('TEMPLATE.xlsx')
writer = pd.ExcelWriter('TEMPLATE.xlsx', engine='openpyxl')
writer.book = book
book.remove(book.get_sheet_by_name('WEEKLY_DATA_RAW'))
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, "WEEKLY_DATA_RAW", index = False)
writer.save()
First question is, is there a way I can accomplish this without deleting and recreating the WEEKLY_DATA_RAW? Instead i would prefer to clear the current data in it and store df in it?
Second question is, after i store the data into WEEKLY_DATA_RAW i have to also append that data into ALL_DATA_RAW tab at the bottom.
How do i go about this?
For your first issue you can create a temp val to hold all your data without changing it and for the next issue if im understanding correctly is to combine/concatenate excel files data. Look at this video and let me know if that is what youre looking for https://www.youtube.com/watch?v=kWaerL6-OiU