I would like to improve my code, replacing this:
workbook = xlwt.Workbook()
sheet = workbook.add_sheet("WS")
header = [u'Nome da estação',u'Altitude',u'Latitude',u'Longitude']
column =0
for h in header:
sheet.write(0, column, h)
column += 1
For some code to use directly the array header to write an entire line. Any idea?
You are unlikely to get any actual improvement from writing your data as a row unit because Excel stores cells individually either way. That is probably why there is no such method documented for xlwt.
You can cut your code down by a couple of lines by using enumerate:
workbook = xlwt.Workbook()
sheet = workbook.add_sheet("WS")
header = [u'Nome da estação',u'Altitude',u'Latitude',u'Longitude']
for column, heading in enumerate(header):
sheet.write(0, column, heading)
If you find yourself doing this sort of thing regularly, write a small utility method:
def write_header(header, row=0, start_col=0):
for column, heading in enumerate(header, start_col):
sheet.write(row, column, heading)
workbook = xlwt.Workbook()
sheet = workbook.add_sheet("WS")
write_header([u'Nome da estação',u'Altitude',u'Latitude',u'Longitude'])
The additional parameters will allow you to set the upper-left corner of the header in the spreadsheet should you ever need to. The default values should cover 99% of use cases.
There's only sheet.write(), which writes for row_index, column_index.
If you're are worried about speed or optimization, just focus on optimizing the for loops, like you would for any other programming flow.
workbook.save() is required only once at the end -- so the file I/O still happens only once.
Related
I have a very large excel file that I'm dealing with in python. I have a column where every cell is a different formula. I want to copy the formulas and paste them one column over from column GD to GE.
The issue is that I want to the formulas to update like they do in excel, its just that excel takes a very long time to copy/paste because the file I'm working with is very large.
Any ideas on possibly how to use openpyxl's translator to do this or anything else?
from openpyxl import load_workbook
import pandas as pd
#loads the excel file and is now saved under workbook#
workbook = load_workbook('file.xlsx')
#uses the individual sheets index(first sheet = 0) to work on one sheet at a time#
sheet= workbook.worksheets[8]
#inserts a column at specified index number#
sheet.insert_cols(187)
#naming the new columns#
sheet['GE2']= '20220531'
here is my updated code
from openpyxl import load_workbook
from openpyxl.formula.translate import Translator
#loads the excel file and is now saved under workbook#
workbook = load_workbook('file.xlsx')
#uses the individual sheets index(first sheet = 0) to work on one sheet at a time#
sheet= workbook.worksheets[8]
formula = sheet['GD3'].value
new_formula = Translator(formula, origin= 'GE3').translate_formula("GD3")
sheet['GD2'] = new_formula
for row in sheet.iter_rows(min_col=187, max_col=188):
old, new = row
if new.data_type != "f":
continue
new_formula = Translator(new.value, origin=old.coordinate).translate_formula(new.coordinate)
workbook.save('file.xlsx')
When you add or remove columns and rows, Openpyxl does not manage formulae for you. The reason for this is simple: where should it stop? Managing a "dependency graph" is exactly the kind of functionality that an application like MS Excel provides.
But it is quite easy to do this in your own code using the Formula Translator
# insert the column
formula = ws['GE1'].value
new_formula = Translator(formula, origin="GD1").translate_formula("GE1")
ws['GE1'] = new_formula
It should be fairly straightforward to create a loop for this (check the data type and use cell.coordinate to avoid potential typos or incorrect adjustments.
sheet.insert_cols(187)
for row in ws.iter_rows(min_col=187, max_col=188):
old, new = row
if new.data_type != "f"
continue
new_formula = Translator(new.value, origin=old.coordinate).translate_formula(new.coordinate)
Complete beginner here but have a specific need to try and make my life easier with automating Excel.
I have a weekly report that contains a lot of useless columns and using Python I can delete these and rename them, with the code below.
from openpyxl import Workbook, load_workbook
wb = load_workbook('TestExcel.xlsx')
ws = wb.active
ws.delete_cols(1,3)
ws.delete_cols(3,8)
ws.delete_cols(4,3)
ws.insert_cols(3,1)
ws['A1'].value = "Full Name"
ws['C1'].value = "Email Address"
ws['C2'].value = '=B2&"#testdomain.com"'
wb.save('TestExcelUpdated.xlsx')
This does the job but I would like the formula to continue from B2 downwards (since the top row are headings).
ws['C2'].value = '=B2&"#testdomain.com"'
Obviously, in Excel it is just a case of dragging the formula down to the end of the column but I'm at a loss to get this working in Python. I've seen similar questions asked but the answers are over my head.
Would really appreciate a dummies guide.
Example of Excel report after Python code
one way to do this is by iterating over the rows in your worksheet.
for row in ws.iter_rows(min_row=2): #min_row ensures you skip your header row
row[2].value = '=B' + str(row[0].row) + '&"#testdomain.com"'
row[2].value selects the third column due to zero based indexing. row[0].row gets the number corresponding to the current row
I have searched around, tried some win32com and some xlrd/xlwt/xlutils but all I can do is insert data into the existing Excel rows - I want to be able to insert one new row (specifically the first one, in my case). Does anyone know how to do this using Python?
as per suggestion, I will include what I did to add a row to my excel file
from xlrd import open_workbook # http://pypi.python.org/pypi/xlrd
from xlutils.copy import copy # http://pypi.python.org/pypi/xlutils
from xlwt import easyxf # http://pypi.python.org/pypi/xlwt
import xlwt
...next part is indented because it's in some for loops, not good at stack overflow formatting
rb = open_workbook( os.path.join(cohort_path, f),on_demand=True,encoding_override="cp1252",formatting_info=True)
#The following is because Messed up file has a missing row
if f=='MessedUp.xls':
r_sheet = rb.sheet_by_name('SHEET NAME') # read only copy to introspect the file
wb = copy(rb)
w_sheet = wb.get_sheet(rb.sheet_names().index('SHEET NAME')) #Workaround
#fix first rows
for col_index in range(0, r_sheet.ncols):
for row_index in range(2, r_sheet.nrows):
xfx = r_sheet.cell_xf_index(row_index-1, col_index)
xf = rb.xf_list[xfx]
bgx = xf.background.pattern_colour_index
xlwt.add_palette_colour("custom_colour", 0x17)
#rb.set_colour_RGB(0x21, 251, 228, 228) #or wb??
style_string = 'pattern: pattern solid, fore_colour custom_colour' if bgx in (55,23) else None
style = xlwt.easyxf(style_string)
w_sheet.write(row_index, col_index, r_sheet.cell(row_index-1,col_index).value,style=style)
wb.save(os.path.join(cohort_path, 'fixed_copy.xls'))
xlwt helps you in writing to excel.
To write anything to excel you have to specify a row and column
So its like worksheet.write(x,y,x*y)
This commands writes to a cell with x, y co-ordinates the values of x*y.
So, in your case, to write to a new row, just give the row number where you want the new row,
and write as much as columns you want. Easy.
Its not a list that you need to append you to. You can jump of to any cell you want to and write.
Check out a useful example here - http://codingtutorials.co.uk/python-excel-xlrd-xlwt/
I am trying to create an Excel workbook where I can auto-set, or auto-adjust the widths of the columns before saving the workbook.
I have been reading the Python-Excel tutorial in hopes of finding some functions in xlwt that emulate xlrd ones (such as sheet_names(), cellname(row, col), cell_type, cell_value, and so on...) For example, suppose I have the following:
from xlwt import Workbook
wb = Workbook()
sh1 = wb.add_sheet('sheet1' , cell_overwrite_ok = True)
sh2 = wb.get_sheet(0)
wb.get_sheet(0) is similar to the rb.sheet_by_index(0) function offered in xlrd, except that the former allows you to modify the contents (provided the user has set cell_overwrite_ok = True)
Assuming xlwt DOES offer the functions I am looking for, I was planning on going through every worksheet again, but this time keeping track of the content that takes the most space for a particular column, and set the column width based on that. Of course, I can also keep track of the max width for a specific column as I write to the sheet, but I feel like it would be cleaner to set the widths after all the data has been already written.
Does anyone know if I can do this? If not, what do you recommend doing in order to adjust the column widths?
I just implemented a wrapper class that tracks the widths of items as you enter them. It seems to work pretty well.
import arial10
class FitSheetWrapper(object):
"""Try to fit columns to max size of any entry.
To use, wrap this around a worksheet returned from the
workbook's add_sheet method, like follows:
sheet = FitSheetWrapper(book.add_sheet(sheet_name))
The worksheet interface remains the same: this is a drop-in wrapper
for auto-sizing columns.
"""
def __init__(self, sheet):
self.sheet = sheet
self.widths = dict()
def write(self, r, c, label='', *args, **kwargs):
self.sheet.write(r, c, label, *args, **kwargs)
width = arial10.fitwidth(label)
if width > self.widths.get(c, 0):
self.widths[c] = width
self.sheet.col(c).width = width
def __getattr__(self, attr):
return getattr(self.sheet, attr)
All the magic is in John Yeung's arial10 module. This has good widths for Arial 10, which is the default Excel font. If you want to write worksheets using other fonts, you'll need to change the fitwidth function, ideally taking into account the style argument passed to FitSheetWrapper.write.
If one is not interested in using another class (FitSheetWrapper), then this can be implemented using WorkSheet column Method.
work = xlwt.WorkBook()
sheet = work.add_sheet('Sheet1')
for row_index in range(0,max_row):
for column_index in range(0,max_col) :
cwidth = sheet.col(column_index).width
if (len(column_data)*367) > cwidth:
sheet.col(column_index).width = (len(column_data)*367) #(Modify column width to match biggest data in that column)
sheet.write(row_index,column_index,column_data,style)
Default value of width is 2962 units and excel points it to as 8.11 units. Hence i am multiplying 367 to length of data.
This is adapted from Kevins FitSheetWrapper.
There is no automatic facility for this in xlwt. You have to follow the general pattern you describe, of keeping track of the max width as you're writing, and setting the column width at the end, sometime after you've seen all the data but before you've saved the workbook.
Note that this is the cleanest and most efficient approach available when dealing with Excel files. If your notion of "after the data has already been written" means after you've already committed the cell values ("writing") but before actually saving the workbook, then the method described above is doing exactly this. If what you mean is after you've already saved the workbook, you want to read it again to get the max widths, and then save it again with new column widths, this will be much slower, and will involve using both xlwt and xlrd (and possibly xlutils as well). Also note that when you are using the genuine Microsoft Excel, there is no notion of "updating" a file. It may seem like that from a user point of view, but what is happening behind the scenes is that every time you do a save, Excel blows away the existing file and writes a brand new one from scratch.
FitSheetWrapper should have a little modify with xlwt3 in 3.3.4
line 19:
change:
width = arial10.fitwidth(label)
to:
width = int(arial10.fitwidth(label))
reason:
\Python\3.3.3\Lib\site-packages\xlwt3\biffrecords.py
1624 def __init__(self, first_col, last_col, width, xf_index, options):
1625 self._rec_data = pack('<6H', first_col, last_col, width, xf_index, options, 0)
width must be integer.
This may be a little late, but I created a method that does this for the whole
sheet at once. It's quick and gets the job done. The extra cushion param. is only needed if you think that the 256 calculation won't be accurate (if you have longer text fields).
from xlrd import *
from xlwt import *
def autoAdjustColumns(workbook, path, writerSheet, writerSheet_index, extraCushion):
readerSheet = open_workbook(path).sheet_by_index(writerSheet_index)
for row in range(readerSheet.nrows):
for column in range(readerSheet.ncols):
thisCell = readerSheet.cell(row, column)
neededWidth = int((1 + len(str(thisCell.value))) * 256)
if writerSheet.col(column).width < neededWidth:
writerSheet.col(column).width = neededWidth + extraCushion
workbook.save(path)
i use this method:
wb = Workbook()
ws = wb.add_sheet('Sheet1')
columnwidth = {}
row = 0
for rowdata in data:
column = 0
for colomndata in rowdata:
if column in columnwidth:
if len(colomndata) > columnwidth[column]:
columnwidth[column] = len(colomndata)
else:
columnwidth[column] = len(colomndata)
ws.write(row, column, colomndata, style0)
column = column + 1
row = row + 1
for column, widthvalue in columnwidth.items():
ws.col(column).width = (widthvalue + 4) * 367
I am trying to create an Excel workbook where I can auto-set, or auto-adjust the widths of the columns before saving the workbook.
I have been reading the Python-Excel tutorial in hopes of finding some functions in xlwt that emulate xlrd ones (such as sheet_names(), cellname(row, col), cell_type, cell_value, and so on...) For example, suppose I have the following:
from xlwt import Workbook
wb = Workbook()
sh1 = wb.add_sheet('sheet1' , cell_overwrite_ok = True)
sh2 = wb.get_sheet(0)
wb.get_sheet(0) is similar to the rb.sheet_by_index(0) function offered in xlrd, except that the former allows you to modify the contents (provided the user has set cell_overwrite_ok = True)
Assuming xlwt DOES offer the functions I am looking for, I was planning on going through every worksheet again, but this time keeping track of the content that takes the most space for a particular column, and set the column width based on that. Of course, I can also keep track of the max width for a specific column as I write to the sheet, but I feel like it would be cleaner to set the widths after all the data has been already written.
Does anyone know if I can do this? If not, what do you recommend doing in order to adjust the column widths?
I just implemented a wrapper class that tracks the widths of items as you enter them. It seems to work pretty well.
import arial10
class FitSheetWrapper(object):
"""Try to fit columns to max size of any entry.
To use, wrap this around a worksheet returned from the
workbook's add_sheet method, like follows:
sheet = FitSheetWrapper(book.add_sheet(sheet_name))
The worksheet interface remains the same: this is a drop-in wrapper
for auto-sizing columns.
"""
def __init__(self, sheet):
self.sheet = sheet
self.widths = dict()
def write(self, r, c, label='', *args, **kwargs):
self.sheet.write(r, c, label, *args, **kwargs)
width = arial10.fitwidth(label)
if width > self.widths.get(c, 0):
self.widths[c] = width
self.sheet.col(c).width = width
def __getattr__(self, attr):
return getattr(self.sheet, attr)
All the magic is in John Yeung's arial10 module. This has good widths for Arial 10, which is the default Excel font. If you want to write worksheets using other fonts, you'll need to change the fitwidth function, ideally taking into account the style argument passed to FitSheetWrapper.write.
If one is not interested in using another class (FitSheetWrapper), then this can be implemented using WorkSheet column Method.
work = xlwt.WorkBook()
sheet = work.add_sheet('Sheet1')
for row_index in range(0,max_row):
for column_index in range(0,max_col) :
cwidth = sheet.col(column_index).width
if (len(column_data)*367) > cwidth:
sheet.col(column_index).width = (len(column_data)*367) #(Modify column width to match biggest data in that column)
sheet.write(row_index,column_index,column_data,style)
Default value of width is 2962 units and excel points it to as 8.11 units. Hence i am multiplying 367 to length of data.
This is adapted from Kevins FitSheetWrapper.
There is no automatic facility for this in xlwt. You have to follow the general pattern you describe, of keeping track of the max width as you're writing, and setting the column width at the end, sometime after you've seen all the data but before you've saved the workbook.
Note that this is the cleanest and most efficient approach available when dealing with Excel files. If your notion of "after the data has already been written" means after you've already committed the cell values ("writing") but before actually saving the workbook, then the method described above is doing exactly this. If what you mean is after you've already saved the workbook, you want to read it again to get the max widths, and then save it again with new column widths, this will be much slower, and will involve using both xlwt and xlrd (and possibly xlutils as well). Also note that when you are using the genuine Microsoft Excel, there is no notion of "updating" a file. It may seem like that from a user point of view, but what is happening behind the scenes is that every time you do a save, Excel blows away the existing file and writes a brand new one from scratch.
FitSheetWrapper should have a little modify with xlwt3 in 3.3.4
line 19:
change:
width = arial10.fitwidth(label)
to:
width = int(arial10.fitwidth(label))
reason:
\Python\3.3.3\Lib\site-packages\xlwt3\biffrecords.py
1624 def __init__(self, first_col, last_col, width, xf_index, options):
1625 self._rec_data = pack('<6H', first_col, last_col, width, xf_index, options, 0)
width must be integer.
This may be a little late, but I created a method that does this for the whole
sheet at once. It's quick and gets the job done. The extra cushion param. is only needed if you think that the 256 calculation won't be accurate (if you have longer text fields).
from xlrd import *
from xlwt import *
def autoAdjustColumns(workbook, path, writerSheet, writerSheet_index, extraCushion):
readerSheet = open_workbook(path).sheet_by_index(writerSheet_index)
for row in range(readerSheet.nrows):
for column in range(readerSheet.ncols):
thisCell = readerSheet.cell(row, column)
neededWidth = int((1 + len(str(thisCell.value))) * 256)
if writerSheet.col(column).width < neededWidth:
writerSheet.col(column).width = neededWidth + extraCushion
workbook.save(path)
i use this method:
wb = Workbook()
ws = wb.add_sheet('Sheet1')
columnwidth = {}
row = 0
for rowdata in data:
column = 0
for colomndata in rowdata:
if column in columnwidth:
if len(colomndata) > columnwidth[column]:
columnwidth[column] = len(colomndata)
else:
columnwidth[column] = len(colomndata)
ws.write(row, column, colomndata, style0)
column = column + 1
row = row + 1
for column, widthvalue in columnwidth.items():
ws.col(column).width = (widthvalue + 4) * 367