I have an excel file where the first four rows contain some header text and the actual dataset starts from row 4. I am trying to build a simple function that reads the excel file and outputs the same excel file after deleting the first 4 rows.
This is what my code looks like before I put it into a function.
import pandas as pd
from openpyxl import load_workbook, Workbook
wb = load_workbook('FILEPATH/excel.xlsx')
ws = wb['Sheet1']
ws = ws.delete_rows(0,4)
wb.save(r"FILEPATH/deleted_row.xlsx")
When I run the code it executes the file properly but when I try to open the excel file it give me errors and says that the file is corrupted. A point to note is that the excel file has some formatting on the rop rows. Is that what is causing some issues?
Any help is appreciated.
EDIT: This is what the errors look like and the file does not open.
In openpyxl, the first row should be 1, not 0. So, if you are looking to delete the first 4 rows, you should change the delete_row() from
ws = ws.delete_rows(0,4)
to
ws = ws.delete_rows(1,4)
I have a dataframe like as shown below
Date,cust,region,Abr,Number,
12/01/2010,Company_Name,Somecity,Chi,36,
12/02/2010,Company_Name,Someothercity,Nyc,156,
df = pd.read_clipboard(sep=',')
I would like to write this dataframe to a specific sheet (called temp_data) in the file output.xlsx
Therfore I tried the below
import pandas
from openpyxl import load_workbook
book = load_workbook('output.xlsx')
writer = pandas.ExcelWriter('output.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
I also tried the below
path = 'output.xlsx'
with pd.ExcelWriter(path) as writer:
writer.book = openpyxl.load_workbook(path)
final_df.to_excel(writer, sheet_name='temp_data',startrow=10)
writer.save()
But am not sure whether I am overcomplicating it. I get an error like as shown below. But I verifiedd in task manager, no excel file/task is running
BadZipFile: File is not a zip file
Moreover, I also lose my formatting of the output.xlsx file when I manage to write the file based on below suggestions. I already have a neatly formatted font,color file etc and just need to put the data inside.
Is there anyway to write the pandas dataframe to a specific sheet in an existing excel file? WITHOUT LOSING FORMATTING OF THE DESTIATION FILE
You need to just use to_excel from pandas dataframe.
Try below snippet:
df1.to_excel("output.xlsx",sheet_name='Sheet_name')
If there is existing data please try below snippet:
writer = pd.ExcelWriter('output.xlsx', engine='openpyxl')
# try to open an existing workbook
writer.book = load_workbook('output.xlsx')
df.to_excel(writer,index=False,header=False,startrow=len(reader)+1)
writer.save()
writer.close()
Are you restricted to using pandas or openpyxl?
Because if you're comfortable using other libraries, the easiest way is probably using win32com to puppet excel as if you were a user manually copying and pasting the information over.
import pandas as pd
import io
import win32com.client as win32
import os
csv_text = """Date,cust,region,Abr,Number
12/01/2010,Company_Name,Somecity,Chi,36
12/02/2010,Company_Name,Someothercity,Nyc,156"""
df = pd.read_csv(io.StringIO(csv_text),sep = ',')
temp_path = r"C:\Users\[User]\Desktop\temp.xlsx" #temporary location where to write this dataframe
df.to_excel(temp_path,index = False) #temporarily write this file to excel, change the output path as needed
excel = win32.Dispatch("Excel.Application")
excel.Visible = True #Switch these attributes to False if you'd prefer Excel to be invisible while excecuting this script
excel.ScreenUpdating = True
temp_wb = excel.Workbooks.Open(temp_path)
temp_ws = temp_wb.Sheets("Sheet1")
output_path = r"C:\Users\[User]\Desktop\output.xlsx" #Path to your output excel file
output_wb = excel.Workbooks.Open(output_path)
output_ws = output_wb.Sheets("Output_sheet")
temp_ws.Range('A1').CurrentRegion.Copy(Destination = output_ws.Range('A1')) # Feel free to modify the Cell where you'd like the data to be copied to
input('Check that output looks like you expected\n') # Added pause here to make sure script doesn't overwrite your file before you've looked at the output
temp_wb.Close()
output_wb.Close(True) #Close output workbook and save changes
excel.Quit() #Close excel
os.remove(temp_path) #Delete temporary excel file
Let me know if this achieves what you were after.
I spent all day on this (and a co-worker of mine spent even longer). Thankfully, it seems to work for my purposes - pasting a dataframe into an Excel sheet without changing any of the Excel source formatting. It requires the pywin32 package, which "drives" Excel as if it a user, using VBA.
import pandas as pd
from win32com import client
# Grab your source data any way you please - I'm defining it manually here:
df = pd.DataFrame([
['LOOK','','','','','','','',''],
['','MA!','','','','','','',''],
['','','I pasted','','','','','',''],
['','','','into','','','','',''],
['','','','','Excel','','','',''],
['','','','','','without','','',''],
['','','','','','','breaking','',''],
['','','','','','','','all the',''],
['','','','','','','','','FORMATTING!']
])
# Copy the df to clipboard, so we can later paste it as text.
df.to_clipboard(index=False, header=False)
excel_app = client.gencache.EnsureDispatch("Excel.Application") # Initialize instance
wb = excel_app.Workbooks.Open("Template.xlsx") # Load your (formatted) template workbook
ws = wb.Worksheets(1) # First worksheet becomes active - you could also refer to a sheet by name
ws.Range("A3").Select() # Only select a single cell using Excel nomenclature, otherwise this breaks
ws.PasteSpecial(Format='Unicode Text') # Paste as text
wb.SaveAs("Updated Template.xlsx") # Save our work
excel_app.Quit() # End the Excel instance
In general, when using the win32com approach, it's helpful to record yourself (with a macro) doing what you want to accomplish in Excel, then reading the generated macro code. Often this will give you excellent clues as to what commands you could invoke.
The solution to your problem exists here: How to save a new sheet in an existing excel file, using Pandas?
To add a new sheet from a df:
import pandas as pd
from openpyxl import load_workbook
import os
import numpy as np
os.chdir(r'C:\workdir')
path = 'output.xlsx'
book = load_workbook(path)
writer = pd.ExcelWriter(path, engine = 'openpyxl')
writer.book = book
### replace with your df ###
x = np.random.randn(100, 2)
df = pd.DataFrame(x)
df.to_excel(writer, sheet_name = 'x')
writer.save()
writer.close()
You can try xltpl.
Create a template file based on your output.xlsx file.
Render a file with your data.
from xltpl.writerx import BookWriterx
writer = BookWriterx('template.xlsx')
d = {'rows': df.values}
d['tpl_name'] = 'tpl_sheet'
d['sheet_name'] = 'temp_data'
writer.render_sheet(d)
d['tpl_name'] = 'other_sheet'
d['sheet_name'] = 'other'
writer.render_sheet(d)
writer.save('out.xls')
See examples.
I created one excel file and wrote something in it. I am trying to read that file through pandas - dataframe, but I am getting error
XLRDError: Unsupported format, or corrupt file: Expected BOF record
Code -
import pandas as pd
a = open("D:\\Joseph\\abcsaa.xlsx","a")
a.write("Hello all")
p = pd.read_excel("D:\\Joseph\\abcsaa.xlsx")
p
Thanks for the answers. I need to store tick data in a excel and then read it through dataframe.
What is the use of open function in python for excel file if I have to use other modules for this ?
Excel file cannot be created with inbuilt python open function. You have to use openpyxl package to read and write excel files.
Some besic operations using openpyxl
import openpyxl
# Open Workbook
wb = openpyxl.load_workbook(filename='example.xlsx', data_only=True)
# Get All Sheets
a_sheet_names = wb.get_sheet_names()
print(a_sheet_names)
# Get Sheet Object by names
o_sheet = wb.get_sheet_by_name("Sheet1")
print(o_sheet)
# Get Cell Values
o_cell = o_sheet['A1']
print(o_cell.value)
o_cell = o_sheet.cell(row=2, column=1)
print(o_cell.value)
o_cell = o_sheet['H1']
print(o_cell.value)
# Sheet Maximum filled Rows and columns
print(o_sheet.max_row)
print(o_sheet.max_column)
Install this if you don't already have it.
pip install XlsxWriter
Code:
import xlsxwriter
workbook = xlsxwriter.Workbook("D:\\Joseph\\abcsaa.xlsx")
worksheet = workbook.add_worksheet()
worksheet.write('A1', 'Hello world')
workbook.close()
XLsxWriter can do a lot and has great documentation here.
If the file already exists, open it the first time with
a = pd.read_excel('path/aabcsaa.xlsx')
Else, create a pandas dataframe with
a = pd.DataFrame(data)
and then save it using
pd.to_excel('path/aabcsaa.xlsx')
You opened your file in append mode ("a"). If you want to read it with read_excel by passing the filename, you need to close the file before:
a.close()
And the content of the file needs to be in valid excel format.
I have a .xlsx file in which multiple worksheets are there (with some content). I want to write some data into specific sheets say sheet1 and sheet5. Right now I am doing it using xlrd, xlwt, and xlutils copy() function. But is there any way to do it by opening the file in append mode and adding the data and save it (Like as we do it for the text/CSV files)?
Here is my code:
rb = open_workbook("C:\text.xlsx",formatting_info='True')
wb = copy(rb)
Sheet1 = wb.get_sheet(8)
Sheet2 = wb.get_sheet(7)
Sheet1.write(0,8,'Obtained_Value')
Sheet2.write(0,8,'Obtained_Value')
value1 = [1,2,3,4]
value2 = [5,6,7,8]
for i in range(len(value1)):
Sheet1.write(i+1,8,value1[i])
for j in range(len(value2)):
Sheet2.write(j+1,8,value2[j])
wb.save("C:\text.xlsx")
You can do it using the openpyxl module or using the xlwings module
Using openpyxl
from openpyxl import workbook #pip install openpyxl
from openpyxl import load_workbook
wb = load_workbook("C:\text.xlsx")
sheets = wb.sheetnames
Sheet1 = wb[sheets[8]]
Sheet2 = wb[sheets[7]]
#Then update as you want it
Sheet1 .cell(row = 2, column = 4).value = 5 #This will change the cell(2,4) to 4
wb.save("HERE PUT THE NEW EXCEL PATH")
the text.xlsx file will be used as a template, all the values from text.xlsx file together with the updated values will be saved in the new file
Using xlwings
import xlwings
wb = xlwings.Book("C:\text.xlsx")
Sheet1 = wb.sheets[8]
Sheet2 = wb.sheets[7]
#Then update as you want it
Sheet1.range(2, 4).value = 4 #This will change the cell(2,4) to 4
wb.save()
wb.close()
Here the file will be updated in the text.xlsx file but if you want to have a copy of the file you can use the code below
shutil.copy("C:\text.xlsx", "C:\newFile.xlsx") #copies text.xslx file to newFile.xslx
and use
wb = xlwings.Book("C:\newFile.xlsx") instead of wb = xlwings.Book("C:\text.xlsx")
As a user of both modules I prefer the second one over the first one.
For manipulating existing excel files you should use openpyxl. Other common libraries like the ones you are using dont support manipulating existing excel files. A workaround is to
save your output file as a different name - text_temp.xlsx
delete your original file - text.xlsx
rename your output file - text_temp.xlsx to text.xlsx