Not able to read excel file which I created through open function? - python

I created one excel file and wrote something in it. I am trying to read that file through pandas - dataframe, but I am getting error
XLRDError: Unsupported format, or corrupt file: Expected BOF record
Code -
import pandas as pd
a = open("D:\\Joseph\\abcsaa.xlsx","a")
a.write("Hello all")
p = pd.read_excel("D:\\Joseph\\abcsaa.xlsx")
p
Thanks for the answers. I need to store tick data in a excel and then read it through dataframe.
What is the use of open function in python for excel file if I have to use other modules for this ?

Excel file cannot be created with inbuilt python open function. You have to use openpyxl package to read and write excel files.
Some besic operations using openpyxl
import openpyxl
# Open Workbook
wb = openpyxl.load_workbook(filename='example.xlsx', data_only=True)
# Get All Sheets
a_sheet_names = wb.get_sheet_names()
print(a_sheet_names)
# Get Sheet Object by names
o_sheet = wb.get_sheet_by_name("Sheet1")
print(o_sheet)
# Get Cell Values
o_cell = o_sheet['A1']
print(o_cell.value)
o_cell = o_sheet.cell(row=2, column=1)
print(o_cell.value)
o_cell = o_sheet['H1']
print(o_cell.value)
# Sheet Maximum filled Rows and columns
print(o_sheet.max_row)
print(o_sheet.max_column)

Install this if you don't already have it.
pip install XlsxWriter
Code:
import xlsxwriter
workbook = xlsxwriter.Workbook("D:\\Joseph\\abcsaa.xlsx")
worksheet = workbook.add_worksheet()
worksheet.write('A1', 'Hello world')
workbook.close()
XLsxWriter can do a lot and has great documentation here.

If the file already exists, open it the first time with
a = pd.read_excel('path/aabcsaa.xlsx')
Else, create a pandas dataframe with
a = pd.DataFrame(data)
and then save it using
pd.to_excel('path/aabcsaa.xlsx')

You opened your file in append mode ("a"). If you want to read it with read_excel by passing the filename, you need to close the file before:
a.close()
And the content of the file needs to be in valid excel format.

Related

Pandas dataframe to specific sheet in a excel file without losing formatting

I have a dataframe like as shown below
Date,cust,region,Abr,Number,
12/01/2010,Company_Name,Somecity,Chi,36,
12/02/2010,Company_Name,Someothercity,Nyc,156,
df = pd.read_clipboard(sep=',')
I would like to write this dataframe to a specific sheet (called temp_data) in the file output.xlsx
Therfore I tried the below
import pandas
from openpyxl import load_workbook
book = load_workbook('output.xlsx')
writer = pandas.ExcelWriter('output.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
I also tried the below
path = 'output.xlsx'
with pd.ExcelWriter(path) as writer:
writer.book = openpyxl.load_workbook(path)
final_df.to_excel(writer, sheet_name='temp_data',startrow=10)
writer.save()
But am not sure whether I am overcomplicating it. I get an error like as shown below. But I verifiedd in task manager, no excel file/task is running
BadZipFile: File is not a zip file
Moreover, I also lose my formatting of the output.xlsx file when I manage to write the file based on below suggestions. I already have a neatly formatted font,color file etc and just need to put the data inside.
Is there anyway to write the pandas dataframe to a specific sheet in an existing excel file? WITHOUT LOSING FORMATTING OF THE DESTIATION FILE
You need to just use to_excel from pandas dataframe.
Try below snippet:
df1.to_excel("output.xlsx",sheet_name='Sheet_name')
If there is existing data please try below snippet:
writer = pd.ExcelWriter('output.xlsx', engine='openpyxl')
# try to open an existing workbook
writer.book = load_workbook('output.xlsx')
df.to_excel(writer,index=False,header=False,startrow=len(reader)+1)
writer.save()
writer.close()
Are you restricted to using pandas or openpyxl?
Because if you're comfortable using other libraries, the easiest way is probably using win32com to puppet excel as if you were a user manually copying and pasting the information over.
import pandas as pd
import io
import win32com.client as win32
import os
csv_text = """Date,cust,region,Abr,Number
12/01/2010,Company_Name,Somecity,Chi,36
12/02/2010,Company_Name,Someothercity,Nyc,156"""
df = pd.read_csv(io.StringIO(csv_text),sep = ',')
temp_path = r"C:\Users\[User]\Desktop\temp.xlsx" #temporary location where to write this dataframe
df.to_excel(temp_path,index = False) #temporarily write this file to excel, change the output path as needed
excel = win32.Dispatch("Excel.Application")
excel.Visible = True #Switch these attributes to False if you'd prefer Excel to be invisible while excecuting this script
excel.ScreenUpdating = True
temp_wb = excel.Workbooks.Open(temp_path)
temp_ws = temp_wb.Sheets("Sheet1")
output_path = r"C:\Users\[User]\Desktop\output.xlsx" #Path to your output excel file
output_wb = excel.Workbooks.Open(output_path)
output_ws = output_wb.Sheets("Output_sheet")
temp_ws.Range('A1').CurrentRegion.Copy(Destination = output_ws.Range('A1')) # Feel free to modify the Cell where you'd like the data to be copied to
input('Check that output looks like you expected\n') # Added pause here to make sure script doesn't overwrite your file before you've looked at the output
temp_wb.Close()
output_wb.Close(True) #Close output workbook and save changes
excel.Quit() #Close excel
os.remove(temp_path) #Delete temporary excel file
Let me know if this achieves what you were after.
I spent all day on this (and a co-worker of mine spent even longer). Thankfully, it seems to work for my purposes - pasting a dataframe into an Excel sheet without changing any of the Excel source formatting. It requires the pywin32 package, which "drives" Excel as if it a user, using VBA.
import pandas as pd
from win32com import client
# Grab your source data any way you please - I'm defining it manually here:
df = pd.DataFrame([
['LOOK','','','','','','','',''],
['','MA!','','','','','','',''],
['','','I pasted','','','','','',''],
['','','','into','','','','',''],
['','','','','Excel','','','',''],
['','','','','','without','','',''],
['','','','','','','breaking','',''],
['','','','','','','','all the',''],
['','','','','','','','','FORMATTING!']
])
# Copy the df to clipboard, so we can later paste it as text.
df.to_clipboard(index=False, header=False)
excel_app = client.gencache.EnsureDispatch("Excel.Application") # Initialize instance
wb = excel_app.Workbooks.Open("Template.xlsx") # Load your (formatted) template workbook
ws = wb.Worksheets(1) # First worksheet becomes active - you could also refer to a sheet by name
ws.Range("A3").Select() # Only select a single cell using Excel nomenclature, otherwise this breaks
ws.PasteSpecial(Format='Unicode Text') # Paste as text
wb.SaveAs("Updated Template.xlsx") # Save our work
excel_app.Quit() # End the Excel instance
In general, when using the win32com approach, it's helpful to record yourself (with a macro) doing what you want to accomplish in Excel, then reading the generated macro code. Often this will give you excellent clues as to what commands you could invoke.
The solution to your problem exists here: How to save a new sheet in an existing excel file, using Pandas?
To add a new sheet from a df:
import pandas as pd
from openpyxl import load_workbook
import os
import numpy as np
os.chdir(r'C:\workdir')
path = 'output.xlsx'
book = load_workbook(path)
writer = pd.ExcelWriter(path, engine = 'openpyxl')
writer.book = book
### replace with your df ###
x = np.random.randn(100, 2)
df = pd.DataFrame(x)
df.to_excel(writer, sheet_name = 'x')
writer.save()
writer.close()
You can try xltpl.
Create a template file based on your output.xlsx file.
Render a file with your data.
from xltpl.writerx import BookWriterx
writer = BookWriterx('template.xlsx')
d = {'rows': df.values}
d['tpl_name'] = 'tpl_sheet'
d['sheet_name'] = 'temp_data'
writer.render_sheet(d)
d['tpl_name'] = 'other_sheet'
d['sheet_name'] = 'other'
writer.render_sheet(d)
writer.save('out.xls')
See examples.

How to properly save excel file using python?

I have a problem when I'm trying to save and than read excel file in python. So this is my function:
import openpyxl
import xlrd
from xlutils.copy import copy
import pandas as pd
def write_excel():
wb = openpyxl.load_workbook('8de69ccb60047ce5.xlsx')
sheet = wb.active
sheet['D18'] = 3
wb.save('8de69ccb60047ce5.xls')
df1 = pd.read_excel('8de69ccb60047ce5.xls', sheet_name='Лист1', header=None, skiprows=1, usecols="H,I")
print(df1)
workbook = xlrd.open_workbook('8de69ccb60047ce5.xls')
worksheet = workbook.sheet_by_index(0)
print(worksheet.cell(17, 8).value)
print(worksheet.cell(18, 8).value)
I'm changing cell D18, saving file and than trying to read other cells that has formulas but I get nothing (also cell without formulas read correctly).
But if I open file manually and save it in Excel that lines of code read those cells correctly.
The problem is this line wb.save('8de69ccb60047ce5.xls'). It saves changes in file but it doesn't saves file correctly (I don't know how to discribe it). How can I read cell with formula after changing the file in python?
Save a file as sample_book.xlsx with save function.
wb.save(filename = 'sample_book.xlsx')
For more info check out this link: https://www.soudegesu.com/en/post/python/create-excel-with-openpyxl/#save-file

Reading and updating sheets in an XLSM file using pandas while preserving the VBA code

I have a requirement to read an xlsm file and update some of the sheets in the file. I want to use pandas for this purpose.
I tried answers presented in the following post. I couldn't see the VBA macros when I add the VBA project back.
https://stackoverflow.com/posts/28170939/revisions
Here are the steps I tried,
Extracted the VBA_project.bin out of the original.xlsm file and then
writer = pd.ExcelWriter('original.xlsx', engine='xlsxwriter')
workbook = writer.book
workbook.filename = 'test.xlsm'
workbook.add_vba_project('vbaProject.bin')
writer.save()
With this I don't see the VBA macros attached to "test.xlsm". The result is the same even if I write it to the "original.xlsm" file.
How do I preserve the VBA macros or add them back to the original xlsm file?
Also, is there a way I can open the "xlsm" file itself rather than the "xlsx" counterpart using pd.ExcelWriter?
You can do this easily with pandas
import pandas as pd
import xlrd
# YOU MUST PUT sheet_name=None TO READ ALL CSV FILES IN YOUR XLSM FILE
df = pd.read_excel('YourFile.xlsm', sheet_name=None)
# prints all sheets
print(df)
Ah, I see. I still can't tell what you are doing, but here are a few general samples of code to get Python to communicate with Excel.
Read contents of a worksheet in Excel:
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
df = pd.read_excel('C:\\your_path\\test.xls', sheetname='Sheet1')
************************************************************************************
Use Python to run Macros in Excel:
import os
import win32com.client
#Launch Excel and Open Wrkbook
xl=win32com.client.Dispatch("Excel.Application")
xl.Workbooks.Open(Filename="C:\your_path\excelsheet.xlsm") #opens workbook in readonly mode.
#Run Macro
xl.Application.Run("excelsheet.xlsm!modulename.macroname")
#Save Document and Quit.
xl.Application.Save()
xl.Application.Quit()
#Cleanup the com reference.
del xl
Write, from Python, to Excel:
import xlsxwriter
# Create an new Excel file and add a worksheet.
workbook = xlsxwriter.Workbook('C:/your_path/ranges_and_offsets.xlsx')
worksheet = workbook.add_worksheet()
# Widen the first column to make the text clearer.
worksheet.set_column('A:A', 20)
# Add a bold format to use to highlight cells.
bold = workbook.add_format({'bold': True})
# Write some simple text.
worksheet.write('A1', 'Hello')
# Text with formatting.
worksheet.write('A2', 'World', bold)
# Write some numbers, with row/column notation.
worksheet.write(2, 0, 123)
worksheet.write(3, 0, 123.456)
workbook.close()
from openpyxl import Workbook
wb = Workbook()
# grab the active worksheet
ws = wb.active
# Data can be assigned directly to cells
ws['A1'] = 42
# Rows can also be appended
ws.append([1, 2, 3])
# Python types will automatically be converted
import datetime
ws['A2'] = datetime.datetime.now()
# Save the file
wb.save("C:\\your_path\\sample.xlsx")

Write data into existing excel file and making summary table

I have to write some data into existing xls file.(i should say that im working on unix and couldnt use windows)
I prefer work with python and have tried some libraries like xlwt, openpyxl, xlutils.
Its not working, cause there is some filter in my xls file. After rewriting this file filter is dissapearing. But i still need this filter.
Could some one tell me about options that i have.
help, please!
Example:
from xlutils.copy import copy
from xlrd import open_workbook
from xlwt import easyxf
start_row=0
rb=open_workbook('file.xls')
r_sheet=rb.sheet_by_index(1)
wb=copy(rb)
w_sheet=wb.get_sheet(1)
for row_index in range(start_row, r_sheet.nrows):
row=r_sheet.row_values(row_index)
call_index=0
for c_el in row:
value=r_sheet.cell(row_index, call_index).value
w_sheet.write(row_index, call_index, value)
call_index+=1
wb.save('file.out.xls');
I also tried:
import xlrd
from openpyxl import Workbook
import unicodedata
rb=xlrd.open_workbook('file.xls')
sheet=rb.sheet_by_index(0)
wb=Workbook()
ws1=wb.create_sheet("Results", 0)
for rownum in range(sheet.nrows):
row=sheet.row_values(rownum)
arr=[]
for c_el in row:
arr.append(c_el)
ws1.append(arr)
ws2=wb.create_sheet("Common", 1)
sheet=rb.sheet_by_index(1)
for rownum in range(sheet.nrows):
row=sheet.row_values(rownum)
arr=[]
for c_el in row:
arr.append(c_el)
ws2.append(arr)
ws2.auto_filter.ref=["A1:A15", "B1:B15"]
#ws['A1']=42
#ws.append([1,2,3])
wb.save('sample.xls')
The problem is still exist. Ok, ill try to find machine running on windows, but i have to admit something else:
There is some rows like this:
enter image description here
Ive understood what i was doing wrong, but i still need help.
First of all, i have one sheet that contains some values
Second sheet contains summary table!!!
If i try to copy this worksheet it did wrong.
So, the question is : how could i make summary table from first sheet?
Suppose your existing excel file has two columns (date and number).
This is how you will append additional rows using openpyxl.
import openpyxl
import datetime
wb = openpyxl.load_workbook('existing_data_file.xlsx')
sheet = wb.get_sheet_by_name('Sheet1')
a = sheet.get_highest_row()
sheet.cell(row=a,column=0).value=datetime.date.today()
sheet.cell(row=a,column=1).value=30378
wb.save('existing_data_file.xlsx')
If you are on Windows, I would suggest you take a look at using the win32com.client approach. This allows you to interact with your spreadsheet using Excel itself. This will ensure that any existing filters, images, tables, macros etc should be preserved.
The following example opens an XLS file adds one entry and saves the whole workbook as a different XLS formatted file:
import win32com.client as win32
import os
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Open(r'input.xls')
ws = wb.Worksheets(1)
# Write a value at A1
ws.Range("A1").Value = "Hello World"
excel.DisplayAlerts = False # Allow file overwrite
wb.SaveAs(r'sample.xls', FileFormat=56)
excel.Application.Quit()
Note, make sure you add full paths to your input and output files.

Data not present in excel sheet

I'm reading a existing excel file by using openpyxl package and trying to save that file it, and it got saved but after opening that excel file no data is present. I used the following code and my requirement is to open the file in use_iterators = True mode only
from openpyxl import load_workbook
wb = load_workbook(filename = 'large_file.xlsx', use_iterators = True)
ws = wb.get_sheet_by_name(name = 'big_data')
for row in ws.iter_rows():
for cell in row:
print cell.internal_value
wb.save("large_file.xlsx")
can u guys show how to save the file and close the file after saving with out losing the data
Try loading with use_iterators = False, as use_iterators = True loads the data information differently, such that it may not contain all the information you wish to save.
Openpyxl writes and entirely new excel file based on the information it has read in, so it's not like you make a small change and just update the file. (This also means if certain features aren't supported in openpyxl (such as VB macros), these won't exist in the file you've saved.)

Categories