ive got a problem.
I want to write a dataframe to an existing Excel-List which contains formulas.
When i Open a workbook and use a writer with pandas, it always says there is unreadable content in it and i need to repair it when i open the Excel-List.
Do you know how to resolve this?
Here is my code to write the list:
def Writer():
book = load_workbook(r'C:\Users\List.xlsx')
writer = pd.ExcelWriter(r'C:\Users\List.xlsx',header=None,
index=False, data_only=True)
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
datafin=FindReqdata.datafin
datafin.to_excel(writer, sheet_name="SheetName", startrow=2,
startcol=5, index=None, header=None)
writer.save()
Writer()
have a look at this: https://stackoverflow.com/a/38075046/14367973
If i understood your question, you want to append more rows to a .xlsx file.
The new rows are from a dataFrame that have the same number of columns than the excel file.
If it is what you are trying to do the answer above should help you.
Keep the xlsx files closed while the script run sometimes it can break it.
Ok so apparently openpyxl has a problem with connection in the
excel spreadsheet. Because one Sheet has connections in it, the file is broken after editing it. I am still trying to fix this bug.
Related
I am reading multiple xml files extracting some data then forming a pandas Dataframe with my data. These are the main steps that I do:
open an xml file
extract some elements
create a pandas dataframe with the extracted elements
append the results in the excel file named "output.xlsx"(using the code below in python)
My steps are repeated for all xml files that i have (15gb of initial data that usually have 100MB of valuable text data)
This is my python code for appending data frames in the output excel file:
book = load_workbook('output.xlsx')
writer = pd.ExcelWriter('output.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = {ws.title: ws for ws in book.worksheets}
startrow = writer.sheets['Sheet1'].max_row
output.to_excel(writer, startrow=startrow,index = False, header = False)
writer.save()
When I open the "output.xlsx" in Excel, I receive a prompt message saying "We found a problem with some content in "output.xlsx". Do you want us to try to recover as much as we can?" with a yes or no answer
This is the log file that excel generates:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<logFileName>error001280_01.xml</logFileName><summary>Errors were detected in
file 'D:\JUPYWORKDIR\2009Results\output.xlsx'</summary><repairedRecords>
<repairedRecord>Repaired Records: String properties from /xl/worksheets/sheet1.xml part
</repairedRecord></repairedRecords></recoveryLog>
I am worried that saving my results to excel format is corrupting my data, i will read "output.xlsx" with pandas in future in order to do some data analysis, does this problem effect my future analysis? I wanted to know why this problem is generated and should I save my data in CSV? any suggestions?
Ps. Checking the last row of "output.xlsx" using python code it is the same number of rows when i import the excel file in a pandas Dataframe, lastly checking the last row of the "recovered file" of Microsoft excel i still find the same number of rows so i think its a generic error of Microsoft excel because of large data but i am not sure
I also had the same issue and spent at least an hr searching to fix it, there was only 1 change and it got fixed, Instead of using writer.save(), try using writer.close() and it should resolve the issue.
Modified the above mentioned code:
options = {}
options['strings_to_formulas'] = False
options['strings_to_urls'] = False
book = load_workbook('output.xlsx')
writer = pd.ExcelWriter('output.xlsx', engine='openpyxl',options=options)
writer.book = book
writer.sheets = {ws.title: ws for ws in book.worksheets}
startrow = writer.sheets['Sheet1'].max_row
output.to_excel(writer, startrow=startrow,index = False, header = False)
writer.close()
I think its an excel problem handling big data because i opened the file with open office spreadsheets and it doesn't show any error
I've spent hours researching this issue but cant seem to find an answer. I have a template in Excel that has conditional formatting already applied to it. I want to import a pandas df into this already formatted excel file so that the data is being formatted accordingly (color, number format, etc.). Does anyone if this is doable? And if so, how?
Ive considered writing a macro and just importing it into python and applying to the df. Just want to see if there's an easier way that I haven't thought of/found. Thanks!
I would advise to try openpyxl
from openpyxl import load_workbook
book = load_workbook(excelpath) # load excel with formats
writer = pandas.ExcelWriter(excelpath, engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, "Sheet1", columns=['a', 'b'], index=False) # only columns 'a' and 'b' will be populated
writer.save()
I have a workbook called TEMPLATE.xlsx. In this workbook i have two tabs, ALL_DATA_RAW and WEEKLY_DATA_RAW. get my data from an API and feed it into Weekly_Data tab by opening TEMPLATE workbook, deleting the WEEKLY_DATA_RAW, then recreating that same tab and storing the df from API into that tab.
book = openpyxl.load_workbook('TEMPLATE.xlsx')
writer = pd.ExcelWriter('TEMPLATE.xlsx', engine='openpyxl')
writer.book = book
book.remove(book.get_sheet_by_name('WEEKLY_DATA_RAW'))
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, "WEEKLY_DATA_RAW", index = False)
writer.save()
First question is, is there a way I can accomplish this without deleting and recreating the WEEKLY_DATA_RAW? Instead i would prefer to clear the current data in it and store df in it?
Second question is, after i store the data into WEEKLY_DATA_RAW i have to also append that data into ALL_DATA_RAW tab at the bottom.
How do i go about this?
For your first issue you can create a temp val to hold all your data without changing it and for the next issue if im understanding correctly is to combine/concatenate excel files data. Look at this video and let me know if that is what youre looking for https://www.youtube.com/watch?v=kWaerL6-OiU
How do I edit spreadsheets using pandas, or any other library.
I have a CSV where I do the data reading and some filters, which I intend to save in an XLSX worksheet ready.
But when I try to send the dataframe to this XLSX worksheet, the file is overwritten by removing all existing edits and sheets in the worksheet.
I'm trying to do so.
excel_name = 'data/nessus/My Scans/Janeiro_2019/teste.xlsx'
writer = pd.ExcelWriter(excel_name, engine='xlsxwriter')
df5.to_excel(writer, sheet_name='FullExport', index=False)
workbook=writer.book
worksheet = writer.sheets['FullExport']
writer.save()
I think I'm doing something wrong, but I can not solve it.
PS:
This dataframe should be sent to the sheet called "FullExport" on line 2
In pandas version 0.24 they will be an option for mode='a'; however; right now you will have to:
writer = pd.ExcelWriter(excel_name, engine='openpyxl')
writer.book = load_workbook(excel_name)
df5.to_excel(writer, sheet_name='FullExport', index=False)
writer.save()
write.close() # i think close() already runs the save function above
I have a Final.xlsx that contains multiple sheet - shee1, sheet2 ,sheet3 , each having some graphs and data. I have another file file5.xlsx that i want to add in Final.xlsx in tab . The below code is working but the Final.xlsx existing sheets data is getting missed(contents,formats, grpahs, and others) . need help to fix this.
import pandas
from openpyxl import load_workbook
book = load_workbook('foo.xlsx')
writer = pandas.ExcelWriter('foo.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df1=pd.read_excel('file5.xlsx')
df1.to_excel(writer, "new",index=False)
writer.save()
Internally Pandas uses the xlrd library to read xlsx files. This library is fast but, because it is essentially bolted onto support for the BIFF format, it's support for OOXML is limited. Seeing as Pandas doesn't know anything about charts, it couldn't keep them anyway.
openpyxl provides utilities in openpyxl.utils.dataframe for going between XLSX's rows and Pandas Dataframes giving you full control when working, while keeping nearly everything else in your file. In your case, however, you don't even need Pandas as you can simply loop over the cells from "file5.xlsx" and copy them to your other file.