Write pandas df into excel file with xlsxwriter? - python

I have scripted code for writing pandas df into excel file with openpyxl. See Fill in pd data frame into existing excel sheet (using openpyxl v2.3.2).
from openpyxl import load_workbook
import pandas as pd
import numpy as np
book=load_workbook("excel_proc.xlsx")
writer=pd.ExcelWriter("excel_proc.xlsx", engine="openpyxl")
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
data_df.to_excel(writer, sheet_name="example", startrow=100, startcol=5, index=False)
writer.save()
That procedure works fine. However, each returned excel file reports, when opening, that it is corrupted, since content is not readable. Excel can repair it and save it again. But this has to be done manually. Since I have to process many files, how can i solve/circumvent that?
Alternatively, how do I have to change the code to use "xlsxwriter" instead of "openpyxyl"?
When I just exchange "engine="openpyxl"" with "engine="xlsxwriter"" python tells me that "'Worksheet' object has no attribute 'write'" at the data_df.to_excel line.
Addition: Excel tells me "removed records named range of /xl/workbook.xml part" is the corruption and has to be repaired. I do not know, what it means

I think you'll have to use openpyxl, because xlsxwriter doesn't support yet modifying of existing Excel XLSX files.
From docs:
It cannot read or modify existing Excel XLSX files.

Related

Trouble wrting to Excel

I' am new to Python and trying to write into a merged cell within Excel. I can see the data that is already stored within this cell/row, so I know its there. However when I try to overwrite it nothing happens.
I have tried messing with the index and header as well but nothing seems to work.
import pandas as pd
from openpyxl import load_workbook
Read the excel file into a pandas DataFrame
df = pd.read_excel(file here', sheet_name='Sheet1')
print(df.iloc[8, 2])
Make the changes to the DataFrame
df.iloc[8, 2] = "Bob Smith"
Load the workbook
book = load_workbook(file here)
writer = pd.ExcelWriter(file here, engine='openpyxl')
writer.book = book
Write the DataFrame to the first sheet
df.to_excel(writer, index=False)
Save the changes to the Excel file
writer.save()
import pandas as pd
from openpyxl import *
file="C:/Users/OneDrive/Bureau/draftExcel.xlsx"
df = pd.read_excel(file,sheet_name='sheet1')
df.iat[5,0]='cell is updated'
print(df) # to check first in the terminal if the content of the cell is updated
book=load_workbook(file)
writer=pd.ExcelWriter(file, engine='openpyxl')
df.to_excel(writer,sheet_name='sheet1',index=False)
writer.close()
I tried to make an example from what you explained because you didn't show your code, so I hope it was helpful.
Instead of using .iloc I used .iat so you can update the data in a specific cell in your DataFrame using column_index instead of column_label.
Remember that the Excel file you are working on must be closed while you are editing data with python, if it is open you will get an error.

Pandas dataframe to specific sheet in a excel file without losing formatting

I have a dataframe like as shown below
Date,cust,region,Abr,Number,
12/01/2010,Company_Name,Somecity,Chi,36,
12/02/2010,Company_Name,Someothercity,Nyc,156,
df = pd.read_clipboard(sep=',')
I would like to write this dataframe to a specific sheet (called temp_data) in the file output.xlsx
Therfore I tried the below
import pandas
from openpyxl import load_workbook
book = load_workbook('output.xlsx')
writer = pandas.ExcelWriter('output.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
I also tried the below
path = 'output.xlsx'
with pd.ExcelWriter(path) as writer:
writer.book = openpyxl.load_workbook(path)
final_df.to_excel(writer, sheet_name='temp_data',startrow=10)
writer.save()
But am not sure whether I am overcomplicating it. I get an error like as shown below. But I verifiedd in task manager, no excel file/task is running
BadZipFile: File is not a zip file
Moreover, I also lose my formatting of the output.xlsx file when I manage to write the file based on below suggestions. I already have a neatly formatted font,color file etc and just need to put the data inside.
Is there anyway to write the pandas dataframe to a specific sheet in an existing excel file? WITHOUT LOSING FORMATTING OF THE DESTIATION FILE
You need to just use to_excel from pandas dataframe.
Try below snippet:
df1.to_excel("output.xlsx",sheet_name='Sheet_name')
If there is existing data please try below snippet:
writer = pd.ExcelWriter('output.xlsx', engine='openpyxl')
# try to open an existing workbook
writer.book = load_workbook('output.xlsx')
df.to_excel(writer,index=False,header=False,startrow=len(reader)+1)
writer.save()
writer.close()
Are you restricted to using pandas or openpyxl?
Because if you're comfortable using other libraries, the easiest way is probably using win32com to puppet excel as if you were a user manually copying and pasting the information over.
import pandas as pd
import io
import win32com.client as win32
import os
csv_text = """Date,cust,region,Abr,Number
12/01/2010,Company_Name,Somecity,Chi,36
12/02/2010,Company_Name,Someothercity,Nyc,156"""
df = pd.read_csv(io.StringIO(csv_text),sep = ',')
temp_path = r"C:\Users\[User]\Desktop\temp.xlsx" #temporary location where to write this dataframe
df.to_excel(temp_path,index = False) #temporarily write this file to excel, change the output path as needed
excel = win32.Dispatch("Excel.Application")
excel.Visible = True #Switch these attributes to False if you'd prefer Excel to be invisible while excecuting this script
excel.ScreenUpdating = True
temp_wb = excel.Workbooks.Open(temp_path)
temp_ws = temp_wb.Sheets("Sheet1")
output_path = r"C:\Users\[User]\Desktop\output.xlsx" #Path to your output excel file
output_wb = excel.Workbooks.Open(output_path)
output_ws = output_wb.Sheets("Output_sheet")
temp_ws.Range('A1').CurrentRegion.Copy(Destination = output_ws.Range('A1')) # Feel free to modify the Cell where you'd like the data to be copied to
input('Check that output looks like you expected\n') # Added pause here to make sure script doesn't overwrite your file before you've looked at the output
temp_wb.Close()
output_wb.Close(True) #Close output workbook and save changes
excel.Quit() #Close excel
os.remove(temp_path) #Delete temporary excel file
Let me know if this achieves what you were after.
I spent all day on this (and a co-worker of mine spent even longer). Thankfully, it seems to work for my purposes - pasting a dataframe into an Excel sheet without changing any of the Excel source formatting. It requires the pywin32 package, which "drives" Excel as if it a user, using VBA.
import pandas as pd
from win32com import client
# Grab your source data any way you please - I'm defining it manually here:
df = pd.DataFrame([
['LOOK','','','','','','','',''],
['','MA!','','','','','','',''],
['','','I pasted','','','','','',''],
['','','','into','','','','',''],
['','','','','Excel','','','',''],
['','','','','','without','','',''],
['','','','','','','breaking','',''],
['','','','','','','','all the',''],
['','','','','','','','','FORMATTING!']
])
# Copy the df to clipboard, so we can later paste it as text.
df.to_clipboard(index=False, header=False)
excel_app = client.gencache.EnsureDispatch("Excel.Application") # Initialize instance
wb = excel_app.Workbooks.Open("Template.xlsx") # Load your (formatted) template workbook
ws = wb.Worksheets(1) # First worksheet becomes active - you could also refer to a sheet by name
ws.Range("A3").Select() # Only select a single cell using Excel nomenclature, otherwise this breaks
ws.PasteSpecial(Format='Unicode Text') # Paste as text
wb.SaveAs("Updated Template.xlsx") # Save our work
excel_app.Quit() # End the Excel instance
In general, when using the win32com approach, it's helpful to record yourself (with a macro) doing what you want to accomplish in Excel, then reading the generated macro code. Often this will give you excellent clues as to what commands you could invoke.
The solution to your problem exists here: How to save a new sheet in an existing excel file, using Pandas?
To add a new sheet from a df:
import pandas as pd
from openpyxl import load_workbook
import os
import numpy as np
os.chdir(r'C:\workdir')
path = 'output.xlsx'
book = load_workbook(path)
writer = pd.ExcelWriter(path, engine = 'openpyxl')
writer.book = book
### replace with your df ###
x = np.random.randn(100, 2)
df = pd.DataFrame(x)
df.to_excel(writer, sheet_name = 'x')
writer.save()
writer.close()
You can try xltpl.
Create a template file based on your output.xlsx file.
Render a file with your data.
from xltpl.writerx import BookWriterx
writer = BookWriterx('template.xlsx')
d = {'rows': df.values}
d['tpl_name'] = 'tpl_sheet'
d['sheet_name'] = 'temp_data'
writer.render_sheet(d)
d['tpl_name'] = 'other_sheet'
d['sheet_name'] = 'other'
writer.render_sheet(d)
writer.save('out.xls')
See examples.

Copying a list to existing excel python

I have multiple lists in my python code and i want to copy those lists to different columns in an already existing excel file.
writer = pd.ExcelWriter('sample.xlsx')
pd.DataFrame(timedata).to_excel(writer, 'timedata')
writer.save()
this writes the list to the excel but it always over writes the data in excel and to write multiple lists in multiple columns is not been defined in this code.
Pandas uses openpyxl for xlsx files(mentioned in pd docs). By checking docs for ExcelWriter, you can see that something like this might work out:
import pandas
from openpyxl import load_workbook
book = load_workbook('sample.xlsx')
writer = pandas.ExcelWriter('sample.xlsx', engine='openpyxl')
writer.book = book
## ExcelWriter for some reason uses writer.sheets to access the sheet.
## If you leave it empty it will not know that sheet Main is already there
## and will create a new sheet.
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
#data_filtered is a pd dataframe
data_filtered.to_excel(writer, "Main", cols=['col1', 'col2'])
writer.save()
If you are using pandas version later than 0.24, then the process is even more simplified:
import pandas as pd
with pd.ExcelWriter('sample.xlsx', engine='openpyxl', mode='a') as writer:
data_filtered.to_excel(writer)

Data missing, format changed in .xlsx file having multiple sheets using pandas, openpyxl while adding new sheet in existing .xlsx file

I have a Final.xlsx that contains multiple sheet - shee1, sheet2 ,sheet3 , each having some graphs and data. I have another file file5.xlsx that i want to add in Final.xlsx in tab . The below code is working but the Final.xlsx existing sheets data is getting missed(contents,formats, grpahs, and others) . need help to fix this.
import pandas
from openpyxl import load_workbook
book = load_workbook('foo.xlsx')
writer = pandas.ExcelWriter('foo.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df1=pd.read_excel('file5.xlsx')
df1.to_excel(writer, "new",index=False)
writer.save()
Internally Pandas uses the xlrd library to read xlsx files. This library is fast but, because it is essentially bolted onto support for the BIFF format, it's support for OOXML is limited. Seeing as Pandas doesn't know anything about charts, it couldn't keep them anyway.
openpyxl provides utilities in openpyxl.utils.dataframe for going between XLSX's rows and Pandas Dataframes giving you full control when working, while keeping nearly everything else in your file. In your case, however, you don't even need Pandas as you can simply loop over the cells from "file5.xlsx" and copy them to your other file.

How to write to an Excel sheet without exporting a dataframe first?

I am trying to write some text to a specific sheet in an Excel file. I export a number of pandas dataframes to the other tabs, but in this one I need only some text - basically some comments explaining how the other tabs were calculated.
I have tried this but it doesn't work:
import pandas as pd
writer=pd.ExcelWriter('myfile.xlsx')
writer.sheets['mytab'].write(1,1,'This is a test')
writer.close()
I have tried adding writer.book.add_worksheet('mytab') and
ws=writer.sheets['mytab']
ws.write(1,1,'This is a test')
but in all cases I am getting: keyerror:'mytab'.
The only solution I have found is to write an empty dataframe to the tab before writing my text to the same tab:
emptydf=pd.DataFrame()
emptydf['x']=[None]
emptydf.to_excel(writer,'mytab',header=False, index=False)
I could of course create a workbook instance, as in the example on the documentation of xlsxwriter: http://xlsxwriter.readthedocs.io/worksheet.html
However, my problem is that I already have a pd.ExcelWriter instance, which is used in the rest of my code to create the other excel sheets.
I even tried passing a workbook instance to to_excel(), but it doesn't work:
workbook = xlsxwriter.Workbook('filename.xlsx')
emptydf.to_excel(workbook,'mytab',header=False, index=False)
Is there any alternative to my solution of exporting an empty dataframe - which seems as unpythonic as it can get?
You mentioned that you used add_worksheet() method from the writer.book object, but it seems to work and do what you wanted it to do. Below I've put in a reproducible example that worked successfully.
import pandas as pd
print(pd.__version__)
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
workbook = writer.book
ws = workbook.add_worksheet('mytab')
ws.write(1,1,'This is a test')
writer.close()
Thought I'd also mention that I'm using pandas 0.18.1.

Categories