Getting sheet names from a large Excel file

Getting sheet names from a large Excel file - python

I am using Python 3.7 and using OpenPyxl to read the sheet names from a large excel workbook (29MB) with 10 tabs.
import openpyxl
from openpyxl import load_workbook
wb = load_workbook(filename='h:\\Master_Portfoliio.xlsx')
print(wb.sheetnames)
The code above works for smaller files but when I use the same code for this file, the code just hangs. I would like to read the sheet names and then remove a tab and then copy a tab from another excel workbook into this workbook.

Have you tried the read_only = True flag?
wb = load_workbook(filename='h:\\Master_Portfoliio.xlsx', read_only = True)

Related

openpyxl save workbook to file with path

I use Openpyxl in Python to write some data in an Excel file using Workbook.
When I want to save the workbook to a file, I can only provide a filename argument, so I can not write files in other directories.
here is my sample code:
import openpyxl
file_name = "sample.xlsx"
wb = openpyxl.Workbook()
# writing some data to workbook
wb.save(filename=file_name)
I have checked the documentation via this link and found nothing more.
Python: 3.10.7
Openpyxl: 3.0.10
Can you help me provide workarounds to solve this problem?

You can indicate a full path + filename, and it will create a file where you need it:
wb = Workbook()
wb.save(r'D:\folder\folder\folder\Filename.xlsx')

Creating workbook and worksheet using openpyxl

I am trying to load an existing Excel file and create a new sheet inside that workbook, but my code is not working using openpyxl.
rb = load_workbook("C:\Raw_Dump.xlsx")
rb.create_sheet("Sheet2")
sheet1 = rb.worksheets[0]
Any help would be appreciated.

You have to save the workbook to the same filename:
rb.save(r"C:\Raw_Dump.xlsx")
full working example:
import openpyxl
ws_name = r"Raw_Dump.xlsx"
rb = openpyxl.load_workbook(ws_name)
rb.create_sheet("Sheet2")
rb.save(ws_name)

I spent a long time searching this and found the best way is to do sheet removal. The code below worked for me:
for sheet in wb.sheetnames:
if sheet not in "MY_SHEET_I_WANNA_KEEP":
rm_sheet = wb[sheet];
wb.remove_sheet(rm_sheet)
wb.save("JustOneSheet.xlsx")

openpyxl: remove_sheet causes IndexError: list index out of range error on saving sheet

I am trying to use openpyxl to:
Open an Excel (2016) workbook which contains 3 worksheets (Sheet1,Sheet2,Sheet3)
Remove a worksheet (Sheet2)
Save the workbook to a different workbook minus Sheet2
from openpyxl import load_workbook
wb = load_workbook("c:/Users/me/book1.xlsx")
ws = wb.get_sheet_by_name('Sheet2')
wb.remove_sheet(ws)
wb.save("c:/Users/me/book2.xlsx")
The wb.save will generate an IndexError: list index out of range error and produce a corrupted book2.xlsx file which Excel cannot open.

I run into similar problem, only with xlwt library. Regardless, the cause is the same, You remove the sheet which is set as active sheet. So, to fix this, before saving workbook, set some other sheet as active. In openpyxl, it would be something like this:
from openpyxl import load_workbook
wb = load_workbook("c:/Users/me/book1.xlsx")
ws = wb.get_sheet_by_name('Sheet2')
wb.remove_sheet(ws)
wb._active_sheet_index = 0
wb.save("c:/Users/me/book2.xlsx")
I must mention that this is not very good programming practice, but there is no method to set active sheet, only to get one.
EDIT: Just found out that this repo was moved to bitbucket, and found that it has method for setting active sheet. Just use:
wb.active = 0

Writing into existing excel file

I have a .xlsx file in which multiple worksheets are there (with some content). I want to write some data into specific sheets say sheet1 and sheet5. Right now I am doing it using xlrd, xlwt, and xlutils copy() function. But is there any way to do it by opening the file in append mode and adding the data and save it (Like as we do it for the text/CSV files)?
Here is my code:
rb = open_workbook("C:\text.xlsx",formatting_info='True')
wb = copy(rb)
Sheet1 = wb.get_sheet(8)
Sheet2 = wb.get_sheet(7)
Sheet1.write(0,8,'Obtained_Value')
Sheet2.write(0,8,'Obtained_Value')
value1 = [1,2,3,4]
value2 = [5,6,7,8]
for i in range(len(value1)):
Sheet1.write(i+1,8,value1[i])
for j in range(len(value2)):
Sheet2.write(j+1,8,value2[j])
wb.save("C:\text.xlsx")

You can do it using the openpyxl module or using the xlwings module
Using openpyxl
from openpyxl import workbook #pip install openpyxl
from openpyxl import load_workbook
wb = load_workbook("C:\text.xlsx")
sheets = wb.sheetnames
Sheet1 = wb[sheets[8]]
Sheet2 = wb[sheets[7]]
#Then update as you want it
Sheet1 .cell(row = 2, column = 4).value = 5 #This will change the cell(2,4) to 4
wb.save("HERE PUT THE NEW EXCEL PATH")
the text.xlsx file will be used as a template, all the values from text.xlsx file together with the updated values will be saved in the new file
Using xlwings
import xlwings
wb = xlwings.Book("C:\text.xlsx")
Sheet1 = wb.sheets[8]
Sheet2 = wb.sheets[7]
#Then update as you want it
Sheet1.range(2, 4).value = 4 #This will change the cell(2,4) to 4
wb.save()
wb.close()
Here the file will be updated in the text.xlsx file but if you want to have a copy of the file you can use the code below
shutil.copy("C:\text.xlsx", "C:\newFile.xlsx") #copies text.xslx file to newFile.xslx
and use
wb = xlwings.Book("C:\newFile.xlsx") instead of wb = xlwings.Book("C:\text.xlsx")
As a user of both modules I prefer the second one over the first one.

For manipulating existing excel files you should use openpyxl. Other common libraries like the ones you are using dont support manipulating existing excel files. A workaround is to
save your output file as a different name - text_temp.xlsx
delete your original file - text.xlsx
rename your output file - text_temp.xlsx to text.xlsx

python xlrd/xlwt create new workbook using sheets from 2 different workbooks preserving formatting

First let me explain my terminology. An Excel workbook has sheets. E.g. a new Excel workbook contains by default 3 sheets.
Now, using xlrd, xlwt and xlutils, my purpose is to output a new workbook (say: file3) with as input 3 sheets from file1 and 1 sheet from file2. This all preserving formatting as much as possible. I am using the following code (file1, file2 you have to create manually by yourself, just fill them with numbers AND text):
import os
import xlrd
import xlwt
from xlutils.copy import copy as xlutils_copy
from copy import deepcopy as deep_copy
new_workbook = xlwt.Workbook()
with xlrd.open_workbook("file1.xls", formatting_info=True) as rb1:
wb1 = xlutils_copy(rb1)
allSheets = []
allSheets.append(wb1.get_sheet(0))
allSheets.append(wb1.get_sheet(1))
allSheets.append(wb1.get_sheet(2))
extra = deep_copy(wb1.get_sheet(1))
allSheets.append(extra)
allSheets[-1].name = 'extra sheet file1'
with xlrd.open_workbook("file2.xls", formatting_info=True) as rb2:
wb2 = xlutils_copy(rb2)
extra2 = deep_copy(wb2.get_sheet(0))
allSheets.append(extra2)
allSheets[-1].name = 'extra sheet file2'
new_workbook._Workbook__worksheets = allSheets
outputFile = "file3.xls"
new_workbook.save(outputFile)
os.startfile(outputFile)
The problem is when I open my 'file3.xls' I end up with an error given by Excel: 'File error: data may have been lost.' Clicking 'OK' and inspecting the file, I see a lot of #VALUE! errors, the column width etc. has been preserved, however the font and colors have not. Remarkable is that numbers have been copied perfectly, but text has not. Does anyone have a clue what is going wrong?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Getting sheet names from a large Excel file - python

Have you tried the read_only = True flag? wb = load_workbook(filename='h:\\Master_Portfoliio.xlsx', read_only = True)

Related

openpyxl save workbook to file with path

Creating workbook and worksheet using openpyxl

openpyxl: remove_sheet causes IndexError: list index out of range error on saving sheet

Writing into existing excel file

python xlrd/xlwt create new workbook using sheets from 2 different workbooks preserving formatting

Categories

Resources