How do I edit spreadsheets using pandas, or any other library.
I have a CSV where I do the data reading and some filters, which I intend to save in an XLSX worksheet ready.
But when I try to send the dataframe to this XLSX worksheet, the file is overwritten by removing all existing edits and sheets in the worksheet.
I'm trying to do so.
excel_name = 'data/nessus/My Scans/Janeiro_2019/teste.xlsx'
writer = pd.ExcelWriter(excel_name, engine='xlsxwriter')
df5.to_excel(writer, sheet_name='FullExport', index=False)
workbook=writer.book
worksheet = writer.sheets['FullExport']
writer.save()
I think I'm doing something wrong, but I can not solve it.
PS:
This dataframe should be sent to the sheet called "FullExport" on line 2
In pandas version 0.24 they will be an option for mode='a'; however; right now you will have to:
writer = pd.ExcelWriter(excel_name, engine='openpyxl')
writer.book = load_workbook(excel_name)
df5.to_excel(writer, sheet_name='FullExport', index=False)
writer.save()
write.close() # i think close() already runs the save function above
Related
I've an excel sheet "Calcs" with 1 column name "old". I'm trying to add new column "new" with a fixed value of "1" to existing sheet "Calcs" and am using below code which is resulting 2 issues.
it's not updating existing sheet rather it's creating new sheet called "Calcs1"
After code is executed and while opening excel file, getting this error. (no such error while opening file before execution of the code).
We found a problem with some content in 'test1.xlsx'. Do you want us
to try to recover as much as we can? if you trust the source of this
workbook, click Yes.
Appreciate any help
import pandas as pd
from openpyxl import load_workbook
file = r"C:\test1.xlsx"
df2 = pd.read_excel(file, sheet_name = 'Calcs')
df2["new"] = "1"
book = load_workbook(file)
writer = pd.ExcelWriter(file, engine = 'openpyxl')
writer.book = book
df2.to_excel(writer, sheet_name = 'Calcs')
writer.save()
writer.close()
My code works exactly like I would like it to by taking the data from the df and inserting it into the desired Excel file while skipping the appropriate rows. However, when I hit the .save() function other sheets that reference the data (mostly through pivots) seem to break even though they were not touched by the writer. I can insert the data into another Excel file, copy, and paste the exact same data where the python data puts it and the corresponding sheets do not break, but display the correct information. How do you stop other sheets from breaking when Python write to the file?
filename_in = 'File Location In'
filename_out = 'File Location Out'
sheet_name = 'Detail'
pos_detail_data_df.to_excel(filename_in, sheet_name=sheet_name, header = False, index = False)
df = pd.read_excel(filename_in, sheet_name=sheet_name)
book = load_workbook(filename_in)
writer = pd.ExcelWriter(filename_out, engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
writer.sheets
df.to_excel(writer, sheet_name, index=False, startrow = 2, header = False)
writer.save()
Edit:
The code was updated to reflect the assistance from below. However, now the process will simply remove everything from my filename_out and replace it with only the sheets from filename_in
I found an Excel file with a slicer so I took a look.
Sample file:
Site: https://www.contextures.com/excelpivottableslicers.html#download
Try:
import pandas as pd
from openpyxl import load_workbook
# sample Excel file with slicers.
# if required download and unzip and put in the folder with this script
sample_file = 'https://www.contextures.com/pivotsamples/regionsalesslicer.zip'
# set your filename_in, filename_out, and sheet_name
filename_in = 'regionsalesslicer.xlsx'
filename_out = 'regionsalesslicerUpdated.xlsx'
sheet_name = 'Sales Data'
# read in the Excel file with pd.read_excel rather than pd.ExcelFile
# just to play safe and avoid any BadZipFile: File is not a zip file errors
df = pd.read_excel(filename_in, sheet_name=sheet_name)
################## WHATEVER YOU WANT BELOW UNTIL LINE 37 ##################
# check the contents
print(df.head(2), '\n')
# make a change (or changes) to your df
# in the case just swap 'Carrot' for 'Orange' in the 'Product' column
df.loc[df['Product'] == 'Carrot', 'Product'] = 'Orange'
# check the contents after the change
print(df.head(2), '\n')
# as long as you have imported from the top two lines and read the file
# and not called ExcelWriter before this point all the other lines above
# are up to you.
################## WHATEVER YOU NEED ABOVE AFTER LINE 15 ##################
# from this point on try...
book = load_workbook(filename_in)
writer = pd.ExcelWriter(filename_out, engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name, index=False)
writer.save()
In the resulting file (in the example code above we used filename_out = 'regionsalesslicerUpdated.xlsx'), the slicers still work.
Example:
Shows 'Orange'. Let's refresh the data...
Slicer/filter shows 'Orange'...
Exporting from pandas to Excel has not deleted any of the sheets etc...
We have successfully overwritten a dataframe to an existing sheet in Excel.
There is no way to do this if you are writing directly to the sheet unless you would like to pay for xlwings. A better (and easier to manage) solution is to change the way you are collecting your data from excel - Also, it won't break any dashboards or slicers you have. It will require some adjustments to your overall data pipeline and how you process it. Again, a one time thing that will pay dividends in the future.
Instead of writing directly to a sheet in the file, you can write to a separate file altogether.
df.to_excel(writer_path_to_seperate_sheet, sheet_name, index=False)
From excel you can now import this file (and every other file that you may write to the folder in the future) via power query.
Select either the file with your data, or preferably, the folder which will contain your file and all future files. Click combine and transform.
Once you complete this step, you can adjust your data set to your liking and load it. It will be a table by default (perfect for pivot tables and anything else). When new files are written to the folder, you simply click refresh on the table data set and wala. All slicers and other dashboard/pivots are left unaffected.
ive got a problem.
I want to write a dataframe to an existing Excel-List which contains formulas.
When i Open a workbook and use a writer with pandas, it always says there is unreadable content in it and i need to repair it when i open the Excel-List.
Do you know how to resolve this?
Here is my code to write the list:
def Writer():
book = load_workbook(r'C:\Users\List.xlsx')
writer = pd.ExcelWriter(r'C:\Users\List.xlsx',header=None,
index=False, data_only=True)
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
datafin=FindReqdata.datafin
datafin.to_excel(writer, sheet_name="SheetName", startrow=2,
startcol=5, index=None, header=None)
writer.save()
Writer()
have a look at this: https://stackoverflow.com/a/38075046/14367973
If i understood your question, you want to append more rows to a .xlsx file.
The new rows are from a dataFrame that have the same number of columns than the excel file.
If it is what you are trying to do the answer above should help you.
Keep the xlsx files closed while the script run sometimes it can break it.
Ok so apparently openpyxl has a problem with connection in the
excel spreadsheet. Because one Sheet has connections in it, the file is broken after editing it. I am still trying to fix this bug.
I am trying to write a dataframe to an existing excel document without having to load the workbook.
The reason is because my excile file contains many pivot table which can not be read-in to python. Therefore i want to know whether it is possible to write directly to the excel file (and specific sheet) without having to load the workbook.
I have previously tried writing to an excel file, but this simply deletes the other existing sheets and just leaves the data i have written in. so to clarify i will need to make sure the other sheets are not deleted.
thanks in advance.
as background, this code will NOT work for me
path = xxxxxxx
#loading workbook
book = load_workbook(path)
# book = load_workbook(path, read_only=False, keep_vba=True)
#object which writes to the workbook
writer = pd.ExcelWriter(path, engine='openpyxl')
#where object and loadworkbook are introduced
writer.book = book
#writing in all sheet names from the workbook
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
I am trying to use ExcelWriter to write/add some information into a workbook that contains multiple sheets.
First time when I use the function, I am creating the workbook with some data. In the second call, I would like to add some information into the workbook in different locations into all sheets.
def Out_Excel(file_name,C,col):
writer = pd.ExcelWriter(file_name,engine='xlsxwriter')
for tab in tabs: # tabs here is provided from a different function that I did not write here to keep it simple and clean
df = DataFrame(C) # the data is different for different sheets but I keep it simple in this case
df.to_excel(writer,sheet_name = tab, startcol = 0 + col, startrow = 0)
writer.save()
In the main code I call this function twice with different col to print out my data in different locations.
Out_Excel('test.xlsx',C,0)
Out_Excel('test.xlsx',D,10)
But the problem is that doing so the output is just the second call of the function as if the function overwrites the entire workbook. I guess I need to load the workbook that already exists in this case?
Any help?
Use load_book from openpyxl - see xlsxwriter and openpyxl docs:
import pandas as pd
from openpyxl import load_workbook
book = load_workbook('test.xlsx')
writer = pd.ExcelWriter('test.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name='tab_name', other_params)
writer.save()
Pandas version 0.24.0 added the mode keyword, which allows you to append to excel workbooks without jumping through the hoops that we used to have to do. Just use mode='a' to append sheets to an existing workbook.
From the documentation:
with ExcelWriter('path_to_file.xlsx', mode='a') as writer:
df.to_excel(writer, sheet_name='Sheet3')
You could also try using the following method to create your Excel spreadsheet:
import pandas as pd
def generate_excel(csv_file, excel_loc, sheet_):
writer = pd.ExcelWriter(excel_loc)
data = pd.read_csv(csv_file, header=0, index_col=False)
data.to_excel(writer, sheet_name=sheet_, index=False)
writer.save()
return(writer.close())
Give this a try and let me know what you think.