I have an issue with the use of Pandas + ExcelWriter + load_workbook.
My need is to be able to modify data from an existing excel file (without deleting the rest).
It works partly, but when I check the size of the produced file and the original one the size is quite different.
Moreover, it seems to lack some properties. Which leads to an error message when I want to integrate the modified file into an application.
The code bellow :
data_filtered = pd.DataFrame([date, date, date, date], index=[2,3,4,5])
book = openpyxl.load_workbook(file_origin)
writer = pd.ExcelWriter(file_modif, engine='openpyxl',datetime_format='dd/mm/yyyy hh:mm:ss', date_format='dd/mm/yyyy')
writer.book = book
## ExcelWriter for some reason uses writer.sheets to access the sheet.
## If you leave it empty it will not know that sheet Main is already there
## and will create a new sheet.
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
data_filtered.to_excel(writer, sheet_name="PCA pour intégration", index=False, startrow=2, startcol=5, header=False, verbose=True)
writer.save()`
Related
I've an excel sheet "Calcs" with 1 column name "old". I'm trying to add new column "new" with a fixed value of "1" to existing sheet "Calcs" and am using below code which is resulting 2 issues.
it's not updating existing sheet rather it's creating new sheet called "Calcs1"
After code is executed and while opening excel file, getting this error. (no such error while opening file before execution of the code).
We found a problem with some content in 'test1.xlsx'. Do you want us
to try to recover as much as we can? if you trust the source of this
workbook, click Yes.
Appreciate any help
import pandas as pd
from openpyxl import load_workbook
file = r"C:\test1.xlsx"
df2 = pd.read_excel(file, sheet_name = 'Calcs')
df2["new"] = "1"
book = load_workbook(file)
writer = pd.ExcelWriter(file, engine = 'openpyxl')
writer.book = book
df2.to_excel(writer, sheet_name = 'Calcs')
writer.save()
writer.close()
My code works exactly like I would like it to by taking the data from the df and inserting it into the desired Excel file while skipping the appropriate rows. However, when I hit the .save() function other sheets that reference the data (mostly through pivots) seem to break even though they were not touched by the writer. I can insert the data into another Excel file, copy, and paste the exact same data where the python data puts it and the corresponding sheets do not break, but display the correct information. How do you stop other sheets from breaking when Python write to the file?
filename_in = 'File Location In'
filename_out = 'File Location Out'
sheet_name = 'Detail'
pos_detail_data_df.to_excel(filename_in, sheet_name=sheet_name, header = False, index = False)
df = pd.read_excel(filename_in, sheet_name=sheet_name)
book = load_workbook(filename_in)
writer = pd.ExcelWriter(filename_out, engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
writer.sheets
df.to_excel(writer, sheet_name, index=False, startrow = 2, header = False)
writer.save()
Edit:
The code was updated to reflect the assistance from below. However, now the process will simply remove everything from my filename_out and replace it with only the sheets from filename_in
I found an Excel file with a slicer so I took a look.
Sample file:
Site: https://www.contextures.com/excelpivottableslicers.html#download
Try:
import pandas as pd
from openpyxl import load_workbook
# sample Excel file with slicers.
# if required download and unzip and put in the folder with this script
sample_file = 'https://www.contextures.com/pivotsamples/regionsalesslicer.zip'
# set your filename_in, filename_out, and sheet_name
filename_in = 'regionsalesslicer.xlsx'
filename_out = 'regionsalesslicerUpdated.xlsx'
sheet_name = 'Sales Data'
# read in the Excel file with pd.read_excel rather than pd.ExcelFile
# just to play safe and avoid any BadZipFile: File is not a zip file errors
df = pd.read_excel(filename_in, sheet_name=sheet_name)
################## WHATEVER YOU WANT BELOW UNTIL LINE 37 ##################
# check the contents
print(df.head(2), '\n')
# make a change (or changes) to your df
# in the case just swap 'Carrot' for 'Orange' in the 'Product' column
df.loc[df['Product'] == 'Carrot', 'Product'] = 'Orange'
# check the contents after the change
print(df.head(2), '\n')
# as long as you have imported from the top two lines and read the file
# and not called ExcelWriter before this point all the other lines above
# are up to you.
################## WHATEVER YOU NEED ABOVE AFTER LINE 15 ##################
# from this point on try...
book = load_workbook(filename_in)
writer = pd.ExcelWriter(filename_out, engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name, index=False)
writer.save()
In the resulting file (in the example code above we used filename_out = 'regionsalesslicerUpdated.xlsx'), the slicers still work.
Example:
Shows 'Orange'. Let's refresh the data...
Slicer/filter shows 'Orange'...
Exporting from pandas to Excel has not deleted any of the sheets etc...
We have successfully overwritten a dataframe to an existing sheet in Excel.
There is no way to do this if you are writing directly to the sheet unless you would like to pay for xlwings. A better (and easier to manage) solution is to change the way you are collecting your data from excel - Also, it won't break any dashboards or slicers you have. It will require some adjustments to your overall data pipeline and how you process it. Again, a one time thing that will pay dividends in the future.
Instead of writing directly to a sheet in the file, you can write to a separate file altogether.
df.to_excel(writer_path_to_seperate_sheet, sheet_name, index=False)
From excel you can now import this file (and every other file that you may write to the folder in the future) via power query.
Select either the file with your data, or preferably, the folder which will contain your file and all future files. Click combine and transform.
Once you complete this step, you can adjust your data set to your liking and load it. It will be a table by default (perfect for pivot tables and anything else). When new files are written to the folder, you simply click refresh on the table data set and wala. All slicers and other dashboard/pivots are left unaffected.
I have a workbook called TEMPLATE.xlsx. In this workbook i have two tabs, ALL_DATA_RAW and WEEKLY_DATA_RAW. get my data from an API and feed it into Weekly_Data tab by opening TEMPLATE workbook, deleting the WEEKLY_DATA_RAW, then recreating that same tab and storing the df from API into that tab.
book = openpyxl.load_workbook('TEMPLATE.xlsx')
writer = pd.ExcelWriter('TEMPLATE.xlsx', engine='openpyxl')
writer.book = book
book.remove(book.get_sheet_by_name('WEEKLY_DATA_RAW'))
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, "WEEKLY_DATA_RAW", index = False)
writer.save()
First question is, is there a way I can accomplish this without deleting and recreating the WEEKLY_DATA_RAW? Instead i would prefer to clear the current data in it and store df in it?
Second question is, after i store the data into WEEKLY_DATA_RAW i have to also append that data into ALL_DATA_RAW tab at the bottom.
How do i go about this?
For your first issue you can create a temp val to hold all your data without changing it and for the next issue if im understanding correctly is to combine/concatenate excel files data. Look at this video and let me know if that is what youre looking for https://www.youtube.com/watch?v=kWaerL6-OiU
I am trying to write a dataframe to an existing excel document without having to load the workbook.
The reason is because my excile file contains many pivot table which can not be read-in to python. Therefore i want to know whether it is possible to write directly to the excel file (and specific sheet) without having to load the workbook.
I have previously tried writing to an excel file, but this simply deletes the other existing sheets and just leaves the data i have written in. so to clarify i will need to make sure the other sheets are not deleted.
thanks in advance.
as background, this code will NOT work for me
path = xxxxxxx
#loading workbook
book = load_workbook(path)
# book = load_workbook(path, read_only=False, keep_vba=True)
#object which writes to the workbook
writer = pd.ExcelWriter(path, engine='openpyxl')
#where object and loadworkbook are introduced
writer.book = book
#writing in all sheet names from the workbook
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
I am trying to use ExcelWriter to write/add some information into a workbook that contains multiple sheets.
First time when I use the function, I am creating the workbook with some data. In the second call, I would like to add some information into the workbook in different locations into all sheets.
def Out_Excel(file_name,C,col):
writer = pd.ExcelWriter(file_name,engine='xlsxwriter')
for tab in tabs: # tabs here is provided from a different function that I did not write here to keep it simple and clean
df = DataFrame(C) # the data is different for different sheets but I keep it simple in this case
df.to_excel(writer,sheet_name = tab, startcol = 0 + col, startrow = 0)
writer.save()
In the main code I call this function twice with different col to print out my data in different locations.
Out_Excel('test.xlsx',C,0)
Out_Excel('test.xlsx',D,10)
But the problem is that doing so the output is just the second call of the function as if the function overwrites the entire workbook. I guess I need to load the workbook that already exists in this case?
Any help?
Use load_book from openpyxl - see xlsxwriter and openpyxl docs:
import pandas as pd
from openpyxl import load_workbook
book = load_workbook('test.xlsx')
writer = pd.ExcelWriter('test.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name='tab_name', other_params)
writer.save()
Pandas version 0.24.0 added the mode keyword, which allows you to append to excel workbooks without jumping through the hoops that we used to have to do. Just use mode='a' to append sheets to an existing workbook.
From the documentation:
with ExcelWriter('path_to_file.xlsx', mode='a') as writer:
df.to_excel(writer, sheet_name='Sheet3')
You could also try using the following method to create your Excel spreadsheet:
import pandas as pd
def generate_excel(csv_file, excel_loc, sheet_):
writer = pd.ExcelWriter(excel_loc)
data = pd.read_csv(csv_file, header=0, index_col=False)
data.to_excel(writer, sheet_name=sheet_, index=False)
writer.save()
return(writer.close())
Give this a try and let me know what you think.