Iterate through columns in Read-only workbook in openpyxl - python

I have a somewhat large .xlsx file - 19 columns, 5185 rows. I want to open the file, read all the values in one column, do some stuff to those values, and then create a new column in the same workbook and write out the modified values. Thus, I need to be able to both read and write in the same file.
My original code did this:
def readExcel(doc):
wb = load_workbook(generalpath + exppath + doc)
ws = wb["Sheet1"]
# iterate through the columns to find the correct one
for col in ws.iter_cols(min_row=1, max_row=1):
for mycell in col:
if mycell.value == "PerceivedSound.RESP":
origCol = mycell.column
# get the column letter for the first empty column to output the new values
newCol = utils.get_column_letter(ws.max_column+1)
# iterate through the rows to get the value from the original column,
# do something to that value, and output it in the new column
for myrow in range(2, ws.max_row+1):
myrow = str(myrow)
# do some stuff to make the new value
cleanedResp = doStuff(ws[origCol + myrow].value)
ws[newCol + myrow] = cleanedResp
wb.save(doc)
However, python threw a memory error after row 3853 because the workbook was too big. The openpyxl docs said to use Read-only mode (https://openpyxl.readthedocs.io/en/latest/optimized.html) to handle big workbooks. I'm now trying to use that; however, there seems to be no way to iterate through the columns when I add the read_only = True param:
def readExcel(doc):
wb = load_workbook(generalpath + exppath + doc, read_only=True)
ws = wb["Sheet1"]
for col in ws.iter_cols(min_row=1, max_row=1):
#etc.
python throws this error:
AttributeError: 'ReadOnlyWorksheet' object has no attribute 'iter_cols'
If I change the final line in the above snippet to:
for col in ws.columns:
python throws the same error:
AttributeError: 'ReadOnlyWorksheet' object has no attribute 'columns'
Iterating over rows is fine (and is included in the documentation I linked above):
for col in ws.rows:
(no error)
This question asks about the AttritubeError but the solution is to remove Read-only mode, which doesn't work for me because openpyxl won't read my entire workbook in not Read-only mode.
So: how do I iterate through columns in a large workbook?
And I haven't yet encountered this, but I will once I can iterate through the columns: how do I both read and write the same workbook, if said workbook is large?
Thanks!

If the worksheet has only around 100,000 cells then you shouldn't have any memory problems. You should probably investigate this further.
iter_cols() is not available in read-only mode because it requires constant and very inefficient reparsing of the underlying XML file. It is however, relatively easy to convert rows into columns from iter_rows() using zip.
def _iter_cols(self, min_col=None, max_col=None, min_row=None,
max_row=None, values_only=False):
yield from zip(*self.iter_rows(
min_row=min_row, max_row=max_row,
min_col=min_col, max_col=max_col, values_only=values_only))
import types
for sheet in workbook:
sheet.iter_cols = types.MethodType(_iter_cols, sheet)

According to the documentation, ReadOnly mode only supports row-based reads (column reads are not implemented). But that's not hard to solve:
wb2 = Workbook(write_only=True)
ws2 = wb2.create_sheet()
# find what column I need
colcounter = 0
for row in ws.rows:
for cell in row:
if cell.value == "PerceivedSound.RESP":
break
colcounter += 1
# cells are apparently linked to the parent workbook meta
# this will retain only values; you'll need custom
# row constructor if you want to retain more
row2 = [cell.value for cell in row]
ws2.append(row2) # preserve the first row in the new file
break # stop after first row
for row in ws.rows:
row2 = [cell.value for cell in row]
row2.append(doStuff(row2[colcounter]))
ws2.append(row2) # write a new row to the new wb
wb2.save('newfile.xlsx')
wb.close()
wb2.close()
# copy `newfile.xlsx` to `generalpath + exppath + doc`
# Either using os.system,subprocess.popen, or shutil.copy2()
You will not be able to write to the same workbook, but as shown above you can open a new workbook (in writeonly mode), write to it, and overwrite the old file using OS copy.

Related

Copy column of cell values from one workbook to another with openpyxl

I am extracting data from one workbook's column and need to copy the data to another existing workbook.
This is how I extract the data (works fine):
wb2 = load_workbook('C:\\folder\\AllSitesOpen2.xlsx')
ws2 = wb2['report1570826222449']
#Extract column A from Open Sites
DateColumnA = []
for row in ws2.iter_rows(min_row=16, max_row=None, min_col=1, max_col=1):
for cell in row:
DateColumnA.append(cell.value)
DateColumnA
The above code successfully outputs the cell values in each row of the first column to DateColumnA
I'd like to paste the values stored in DateColumnA to this existing destination workbook:
#file to be pasted into
wb3 = load_workbook('C:\\folder\\output.xlsx')
ws3 = wb3['Sheet1']
But I am missing a piece conceptually here. I can't connect the dots. Can someone advise how I can get this data from my source workbook to the new destination workbook?
Lets say you want to copy the column starting in cell 'A1' of 'Sheet1' in wb3:
wb3 = load_workbook('C:\\folder\\output.xlsx')
ws3 = wb3['Sheet1']
for counter in range(len(DateColumnA)):
cell_id = 'A' + str(counter + 1)
ws3[cell_id] = DateColumnA[counter]
wb3.save('C:\\folder\\output.xlsx')
I ended up getting this to write the list to another pre-existing spreadsheet:
for x, rows in enumerate(DateColumnA):
ws3.cell(row=x+1, column=1).value = rows
#print(rows)
wb3.save('C:\\folder\\output.xlsx')
Works great but now I need to determine how to write the data to output.xlsx starting at row 16 instead of row 1 so I don't overwrite the first 16 existing header rows in output.xlsx. Any ideas appreciated.
I figured out a more concise way to write the source data to a different starting row on destination sheet in a different workbook. I do not need to dump the values in to a list as I did above. iter_rows does all the work and openpyxl nicely passes it to a different workbook and worksheet:
row_offset=5
for rows in ws2.iter_rows(min_row=2, max_row=None, min_col=1, max_col=1):
for cell in rows:
ws3.cell(row=cell.row + row_offset, column=1, value=cell.value)
wb3.save('C:\\folder\\DestFile.xlsx')

openpyxl - "copy/paste" range of cells

I'm new to Python and I'm trying to adapt some of my VBA code to it using the openpyxl library. On this particular case, I'm trying to copy 468 rows in a single column from a workbook according to the string in the header and to paste them in another workbook in a particular column that has another specific string as a header. I can't simply select the range of cells I want to copy because this is part of a report automation and the headers change positions from file to file.
What's the function I need to use to copy each of the 468 cells from one workbook into the 468 cells of the second workbook? Or alternatively how can I copy a range of cells and then paste them in another workbook? Here is my code and I know exactly what's wrong: I'm copying one cell (the last from the first workbook) repeatedly into the 468 cells of the second workbook.
#!/usr/bin/python3
import pdb
import openpyxl
from openpyxl.utils import column_index_from_string
wb1 = openpyxl.load_workbook('.../Extraction.xlsx')
wb2 = openpyxl.load_workbook('.../Template.xlsx')
ws1 = wb1.active
first_row1 = list(ws1.rows)[0] #to select the first row (header)
for cell in first_row1:
if cell.value == "email":
x = cell.column #to get the column
y = column_index_from_string(x) #to get the column's index
for i in range(2, 469):
cell_range1 = ws1.cell(i, y) #the wrong part
ws2 = wb2.active
first_row2 = list(ws2.rows)[0]
for cell in first_row2:
if cell.value == "emailAddress":
w = cell.column
z = column_index_from_string(w)
for o in range(2, 469):
cell_range2 = ws2.cell(o, z)
cell_range2.value = cell_range1.value
path = '.../Test.xlsx'
wb2.save(path)
It is actually quite easy to create such a function:
from openpyxl.utils import rows_from_range
def copy_range(range_str, src, dst):
for row in rows_from_range(range_str):
for cell in row:
dst[cell].value = src[cell].value
return
Note that range_str is a regular string such as "A1:B2" and src and dest both have to be valid sheet objects. However, if you are copying large ranges, this might take a while, as the read/writes seem to be rather time-consuming.
You may have to flip the input to .cell(), I guess it is .cell(column, row). Or just use the keywords .cell(column=z, row=o)
You need a dynamic index for both of the row iterators, while keeping the column indices where you found them:
for o in range(2, 469):
#note the common o for both, could also be o+1 for one if there is an offset
ws2.cell(o, z).value = ws1.cell(o, y).value

How to copy worksheet from one workbook to another one using openpyxl?

I have a large amount of EXCEL files (i.e. 200) I would like to copy one specific worksheet from one workbook to another one. I have done some investigations and I couldn't find a way of doing it with Openpyxl
This is the code I have developed so far
def copy_sheet_to_different_EXCEL(path_EXCEL_read,Sheet_name_to_copy,path_EXCEL_Save,Sheet_new_name):
''' Function used to copy one EXCEL sheet into another file.
def path_EXCEL_read,Sheet_name_to_copy,path_EXCEL_Save,Sheet_new_name
Input data:
1.) path_EXCEL_read: the location of the EXCEL file along with the name where the information is going to be saved
2.) Sheet_name_to_copy= The name of the EXCEL sheet to copy
3.) path_EXCEL_Save: The path of the EXCEL file where the sheet is going to be copied
3.) Sheet_new_name: The name of the new EXCEL sheet
Output data:
1.) Status= If 0, everything went OK. If 1, one error occurred.
Version History:
1.0 (2017-02-20): Initial version.
'''
status=0
if(path_EXCEL_read.endswith('.xls')==1):
print('ERROR - EXCEL xls file format is not supported by openpyxl. Please, convert the file to an XLSX format')
status=1
return status
try:
wb = openpyxl.load_workbook(path_EXCEL_read,read_only=True)
except:
print('ERROR - EXCEL file does not exist in the following location:\n {0}'.format(path_EXCEL_read))
status=1
return status
Sheet_names=wb.get_sheet_names() # We copare against the sheet name we would like to cpy
if ((Sheet_name_to_copy in Sheet_names)==0):
print('ERROR - EXCEL sheet does not exist'.format(Sheet_name_to_copy))
status=1
return status
# We checking if the destination file exists
if (os.path.exists(path_EXCEL_Save)==1):
#If true, file exist so we open it
if(path_EXCEL_Save.endswith('.xls')==1):
print('ERROR - Destination EXCEL xls file format is not supported by openpyxl. Please, convert the file to an XLSX format')
status=1
return status
try:
wdestiny = openpyxl.load_workbook(path_EXCEL_Save)
except:
print('ERROR - Destination EXCEL file does not exist in the following location:\n {0}'.format(path_EXCEL_read))
status=1
return status
#we check if the destination sheet exists. If so, we will delete it
destination_list_sheets = wdestiny.get_sheet_names()
if((Sheet_new_name in destination_list_sheets) ==True):
print('WARNING - Sheet "{0}" exists in: {1}. It will be deleted!'.format(Sheet_new_name,path_EXCEL_Save))
wdestiny.remove_sheet(Sheet_new_name)
else:
wdestiny=openpyxl.Workbook()
# We copy the Excel sheet
try:
sheet_to_copy = wb.get_sheet_by_name(Sheet_name_to_copy)
target = wdestiny.copy_worksheet(sheet_to_copy)
target.title=Sheet_new_name
except:
print('ERROR - Could not copy the EXCEL sheet. Check the file')
status=1
return status
try:
wdestiny.save(path_EXCEL_Save)
except:
print('ERROR - Could not save the EXCEL sheet. Check the file permissions')
status=1
return status
#Program finishes
return status
I had the same problem. For me style, format, and layout were very important. Moreover, I did not want to copy formulas but only the value (of the formulas). After a lot of trail, error, and stackoverflow I came up with the following functions. It may look a bit intimidating but the code copies a sheet from one Excel file to another (possibly existing file) while preserving:
font and color of text
filled color of cells
merged cells
comment and hyperlinks
format of the cell value
the width of every row and column
whether or not row and column are hidden
frozen rows
It is useful when you want to gather sheets from many workbooks and bind them into one workbook. I copied most attributes but there might be a few more. In that case you can use this script as a jumping off point to add more.
###############
## Copy a sheet with style, format, layout, ect. from one Excel file to another Excel file
## Please add the ..path\\+\\file.. and ..sheet_name.. according to your desire.
import openpyxl
from copy import copy
def copy_sheet(source_sheet, target_sheet):
copy_cells(source_sheet, target_sheet) # copy all the cel values and styles
copy_sheet_attributes(source_sheet, target_sheet)
def copy_sheet_attributes(source_sheet, target_sheet):
target_sheet.sheet_format = copy(source_sheet.sheet_format)
target_sheet.sheet_properties = copy(source_sheet.sheet_properties)
target_sheet.merged_cells = copy(source_sheet.merged_cells)
target_sheet.page_margins = copy(source_sheet.page_margins)
target_sheet.freeze_panes = copy(source_sheet.freeze_panes)
# set row dimensions
# So you cannot copy the row_dimensions attribute. Does not work (because of meta data in the attribute I think). So we copy every row's row_dimensions. That seems to work.
for rn in range(len(source_sheet.row_dimensions)):
target_sheet.row_dimensions[rn] = copy(source_sheet.row_dimensions[rn])
if source_sheet.sheet_format.defaultColWidth is None:
print('Unable to copy default column wide')
else:
target_sheet.sheet_format.defaultColWidth = copy(source_sheet.sheet_format.defaultColWidth)
# set specific column width and hidden property
# we cannot copy the entire column_dimensions attribute so we copy selected attributes
for key, value in source_sheet.column_dimensions.items():
target_sheet.column_dimensions[key].min = copy(source_sheet.column_dimensions[key].min) # Excel actually groups multiple columns under 1 key. Use the min max attribute to also group the columns in the targetSheet
target_sheet.column_dimensions[key].max = copy(source_sheet.column_dimensions[key].max) # https://stackoverflow.com/questions/36417278/openpyxl-can-not-read-consecutive-hidden-columns discussed the issue. Note that this is also the case for the width, not onl;y the hidden property
target_sheet.column_dimensions[key].width = copy(source_sheet.column_dimensions[key].width) # set width for every column
target_sheet.column_dimensions[key].hidden = copy(source_sheet.column_dimensions[key].hidden)
def copy_cells(source_sheet, target_sheet):
for (row, col), source_cell in source_sheet._cells.items():
target_cell = target_sheet.cell(column=col, row=row)
target_cell._value = source_cell._value
target_cell.data_type = source_cell.data_type
if source_cell.has_style:
target_cell.font = copy(source_cell.font)
target_cell.border = copy(source_cell.border)
target_cell.fill = copy(source_cell.fill)
target_cell.number_format = copy(source_cell.number_format)
target_cell.protection = copy(source_cell.protection)
target_cell.alignment = copy(source_cell.alignment)
if source_cell.hyperlink:
target_cell._hyperlink = copy(source_cell.hyperlink)
if source_cell.comment:
target_cell.comment = copy(source_cell.comment)
wb_target = openpyxl.Workbook()
target_sheet = wb_target.create_sheet(..sheet_name..)
wb_source = openpyxl.load_workbook(..path\\+\\file_name.., data_only=True)
source_sheet = wb_source[..sheet_name..]
copy_sheet(source_sheet, target_sheet)
if 'Sheet' in wb_target.sheetnames: # remove default sheet
wb_target.remove(wb_target['Sheet'])
wb_target.save('out.xlsx')
i found a way playing around with it
import openpyxl
xl1 = openpyxl.load_workbook('workbook1.xlsx')
# sheet you want to copy
s = openpyxl.load_workbook('workbook2.xlsx').active
s._parent = xl1
xl1._add_sheet(s)
xl1.save('some_path/name.xlsx')
You cannot use copy_worksheet() to copy between workbooks because it depends on global constants that may vary between workbooks. The only safe and reliable way to proceed is to go row-by-row and cell-by-cell.
You might want to read the discussions about this feature
For speed I am using data_only and read_only attributes when opening my workbooks. Also iter_rows() is really fast, too.
#Oscar's excellent answer needs some changes to support ReadOnlyWorksheet and EmptyCell
# Copy a sheet with style, format, layout, ect. from one Excel file to another Excel file
# Please add the ..path\\+\\file.. and ..sheet_name.. according to your desire.
import openpyxl
from copy import copy
def copy_sheet(source_sheet, target_sheet):
copy_cells(source_sheet, target_sheet) # copy all the cel values and styles
copy_sheet_attributes(source_sheet, target_sheet)
def copy_sheet_attributes(source_sheet, target_sheet):
if isinstance(source_sheet, openpyxl.worksheet._read_only.ReadOnlyWorksheet):
return
target_sheet.sheet_format = copy(source_sheet.sheet_format)
target_sheet.sheet_properties = copy(source_sheet.sheet_properties)
target_sheet.merged_cells = copy(source_sheet.merged_cells)
target_sheet.page_margins = copy(source_sheet.page_margins)
target_sheet.freeze_panes = copy(source_sheet.freeze_panes)
# set row dimensions
# So you cannot copy the row_dimensions attribute. Does not work (because of meta data in the attribute I think). So we copy every row's row_dimensions. That seems to work.
for rn in range(len(source_sheet.row_dimensions)):
target_sheet.row_dimensions[rn] = copy(source_sheet.row_dimensions[rn])
if source_sheet.sheet_format.defaultColWidth is None:
print('Unable to copy default column wide')
else:
target_sheet.sheet_format.defaultColWidth = copy(source_sheet.sheet_format.defaultColWidth)
# set specific column width and hidden property
# we cannot copy the entire column_dimensions attribute so we copy selected attributes
for key, value in source_sheet.column_dimensions.items():
target_sheet.column_dimensions[key].min = copy(source_sheet.column_dimensions[key].min) # Excel actually groups multiple columns under 1 key. Use the min max attribute to also group the columns in the targetSheet
target_sheet.column_dimensions[key].max = copy(source_sheet.column_dimensions[key].max) # https://stackoverflow.com/questions/36417278/openpyxl-can-not-read-consecutive-hidden-columns discussed the issue. Note that this is also the case for the width, not onl;y the hidden property
target_sheet.column_dimensions[key].width = copy(source_sheet.column_dimensions[key].width) # set width for every column
target_sheet.column_dimensions[key].hidden = copy(source_sheet.column_dimensions[key].hidden)
def copy_cells(source_sheet, target_sheet):
for r, row in enumerate(source_sheet.iter_rows()):
for c, cell in enumerate(row):
source_cell = cell
if isinstance(source_cell, openpyxl.cell.read_only.EmptyCell):
continue
target_cell = target_sheet.cell(column=c+1, row=r+1)
target_cell._value = source_cell._value
target_cell.data_type = source_cell.data_type
if source_cell.has_style:
target_cell.font = copy(source_cell.font)
target_cell.border = copy(source_cell.border)
target_cell.fill = copy(source_cell.fill)
target_cell.number_format = copy(source_cell.number_format)
target_cell.protection = copy(source_cell.protection)
target_cell.alignment = copy(source_cell.alignment)
if not isinstance(source_cell, openpyxl.cell.ReadOnlyCell) and source_cell.hyperlink:
target_cell._hyperlink = copy(source_cell.hyperlink)
if not isinstance(source_cell, openpyxl.cell.ReadOnlyCell) and source_cell.comment:
target_cell.comment = copy(source_cell.comment)
With a usage something like
wb = Workbook()
wb_source = load_workbook(filename, data_only=True, read_only=True)
for sheetname in wb_source.sheetnames:
source_sheet = wb_source[sheetname]
ws = wb.create_sheet("Orig_" + sheetname)
copy_sheet(source_sheet, ws)
wb.save(new_filename)
I had a similar requirement to collate data from multiple workbooks into one workbook. As there are no inbuilt methods available in openpyxl.
I created the below script to do the job for me.
Note: In my usecase all worbooks contain data in same format.
from openpyxl import load_workbook
import os
# The below method is used to read data from an active worksheet and store it in memory.
def reader(file):
global path
abs_file = os.path.join(path, file)
wb_sheet = load_workbook(abs_file).active
rows = []
# min_row is set to 2, to ignore the first row which contains the headers
for row in wb_sheet.iter_rows(min_row=2):
row_data = []
for cell in row:
row_data.append(cell.value)
# custom column data I am adding, not needed for typical use cases
row_data.append(file[17:-6])
# Creating a list of lists, where each list contain a typical row's data
rows.append(row_data)
return rows
if __name__ == '__main__':
# Folder in which my source excel sheets are present
path = r'C:\Users\tom\Desktop\Qt'
# To get the list of excel files
files = os.listdir(path)
for file in files:
rows = reader(file)
# below mentioned file name should be already created
book = load_workbook('new.xlsx')
sheet = book.active
for row in rows:
sheet.append(row)
book.save('new.xlsx')
My workaround goes like this:
You have a template file let's say it's "template.xlsx".
You open it, make changes to it as needed, save it as a new file, close the file.
Repeat as needed. Just make sure to keep a copy of the original template while testing/messing around.
I've just found this question. A good workaround, as mentioned here, could consists in modifying the original wb in memory and then saving it with another name. For example:
import openpyxl
# your starting wb with 2 Sheets: Sheet1 and Sheet2
wb = openpyxl.load_workbook('old.xlsx')
sheets = wb.sheetnames # ['Sheet1', 'Sheet2']
for s in sheets:
if s != 'Sheet2':
sheet_name = wb.get_sheet_by_name(s)
wb.remove_sheet(sheet_name)
# your final wb with just Sheet1
wb.save('new.xlsx')
A workaround I use is saving the current sheet as a pandas data frame and loading it to the excel workbook you need
It actually can be done in a very simple way !
It just need 3 steps :
Open a file using load_workbook
wb = load_workbook('File_1.xlsx')
Select a sheet you want to copy
ws = wb.active
use name of the new file to save the file
wb.save('New_file.xlsx')
This code will save sheet of first file (File_1.xlsx) to the secound file (New_file.xlsx).

How to read a particular cell by using "wb = load_workbook('path', True)" in openpyxl

there
I have written code for reading the large excel files
but my requirement is to read a particular cell like for e.g(cell(row,column) in a excel file when i kept True
in wb = load_workbook('Path', True)
any body please help me...
CODE:
from openpyxl import load_workbook
wb = load_workbook('Path', True)
sheet_ranges = wb.get_sheet_by_name(name = 'Global')
for row in sheet_ranges.iter_rows():
for cell in row:
print cell.internal_value
Since you are using an Optimized Reader, you cannot just access an arbitrary cell using ws.cell(row, column).value:
cell, range, rows, columns methods and properties are disabled
Optimized reader was designed and created specially for reading an umlimited amount of data from an excel file by using iterators.
Basically you should iterate over rows and cells until you get the necessary cell. Here's a simple example:
for r, row in enumerate(sheet_ranges.iter_rows()):
if r == 10:
for c, cell in enumerate(row):
if c == 5:
print cell.internal_value
You can find the answer here.
I recommend you consult the documentation first before asking a question on SO.
In particular, this is pretty much exactly what you want:
d = ws.cell(row = 4, column = 2)
where ws is a worksheet.

Is it possible to get an Excel document's row count without loading the entire document into memory?

I'm working on an application that processes huge Excel 2007 files, and I'm using OpenPyXL to do it. OpenPyXL has two different methods of reading an Excel file - one "normal" method where the entire document is loaded into memory at once, and one method where iterators are used to read row-by-row.
The problem is that when I'm using the iterator method, I don't get any document meta-data like column widths and row/column count, and i really need this data. I assume this data is stored in the Excel document close to the top, so it shouldn't be necessary to load the whole 10MB file into memory to get access to it.
So, is there a way to get ahold of the row/column count and column widths without loading the entire document into memory first?
Adding on to what Hubro said, apparently get_highest_row() has been deprecated. Using the max_row and max_column properties returns the row and column count. For example:
wb = load_workbook(path, use_iterators=True)
sheet = wb.worksheets[0]
row_count = sheet.max_row
column_count = sheet.max_column
The solution suggested in this answer has been deprecated, and might no longer work.
Taking a look at the source code of OpenPyXL (IterableWorksheet) I've figured out how to get the column and row count from an iterator worksheet:
wb = load_workbook(path, use_iterators=True)
sheet = wb.worksheets[0]
row_count = sheet.get_highest_row() - 1
column_count = letter_to_index(sheet.get_highest_column()) + 1
IterableWorksheet.get_highest_column returns a string with the column letter that you can see in Excel, e.g. "A", "B", "C" etc. Therefore I've also written a function to translate the column letter to a zero based index:
def letter_to_index(letter):
"""Converts a column letter, e.g. "A", "B", "AA", "BC" etc. to a zero based
column index.
A becomes 0, B becomes 1, Z becomes 25, AA becomes 26 etc.
Args:
letter (str): The column index letter.
Returns:
The column index as an integer.
"""
letter = letter.upper()
result = 0
for index, char in enumerate(reversed(letter)):
# Get the ASCII number of the letter and subtract 64 so that A
# corresponds to 1.
num = ord(char) - 64
# Multiply the number with 26 to the power of `index` to get the correct
# value of the letter based on it's index in the string.
final_num = (26 ** index) * num
result += final_num
# Subtract 1 from the result to make it zero-based before returning.
return result - 1
I still haven't figured out how to get the column sizes though, so I've decided to use a fixed-width font and automatically scaled columns in my application.
Python 3
import openpyxl as xl
wb = xl.load_workbook("Sample.xlsx", enumerate)
#the 2 lines under do the same.
sheet = wb.get_sheet_by_name('sheet')
sheet = wb.worksheets[0]
row_count = sheet.max_row
column_count = sheet.max_column
#this works fore me.
This might be extremely convoluted and I might be missing the obvious, but without OpenPyXL filling in the column_dimensions in Iterable Worksheets (see my comment above), the only way I can see of finding the column size without loading everything is to parse the xml directly:
from xml.etree.ElementTree import iterparse
from openpyxl import load_workbook
wb=load_workbook("/path/to/workbook.xlsx", use_iterators=True)
ws=wb.worksheets[0]
xml = ws._xml_source
xml.seek(0)
for _,x in iterparse(xml):
name= x.tag.split("}")[-1]
if name=="col":
print "Column %(max)s: Width: %(width)s"%x.attrib # width = x.attrib["width"]
if name=="cols":
print "break before reading the rest of the file"
break
https://pythonhosted.org/pyexcel/iapi/pyexcel.sheets.Sheet.html
see : row_range() Utility function to get row range
if you use pyexcel, can call row_range get max rows.
python 3.4 test pass.
Options using pandas.
Gets all sheetnames with count of rows and columns.
import pandas as pd
xl = pd.ExcelFile('file.xlsx')
sheetnames = xl.sheet_names
for sheet in sheetnames:
df = xl.parse(sheet)
dimensions = df.shape
print('sheetname', ' --> ', dimensions)
Single sheet count of rows and columns.
import pandas as pd
xl = pd.ExcelFile('file.xlsx')
sheetnames = xl.sheet_names
df = xl.parse(sheetnames[0]) # [0] get first tab/sheet.
dimensions = df.shape
print(f'sheetname: "{sheetnames[0]}" - -> {dimensions}')
output sheetname "Sheet1" --> (row count, column count)

Categories