Finding Excel cell reference using Python - python

Here is the Excel file in question:
Context: I am writing a program which can pull values from a PDF and put them in the appropriate cell in an Excel file.
Question: I want to write a function which takes a column value (e.g. 2014) and a row value (e.g. 'COGS') as arguments and return the cell reference where those two intersect (e.g. 'C3' for 2014 COGS).
def find_correct_cell(year=2014, item='COGS'):
#do something similar to what the =match function in Excel does
return cell_reference #returns 'C3'
I have already tried using openpyxl like this to change the values of some random empty cells where I can store these values:
col_num = '=match(2014, A1:E1)'
row_num = '=match("COGS", A1:A5)'
But I want to grab those values without having to arbitrarily write to those random empty cells. Plus, even with this method, when I read those cells (F5 and F6) it reads the formulae in those cells and not the face value of 3.
Any help is appreciated, thanks.

Consider a translated VBA solution as the Match function can adequately handle your needs. Python can access the Excel VBA Object Library using a COM interface with the win32com module. Please note this solution assumes you are using Excel for PC. Below includes the counterpart VBA function.
VBA Function (native interface)
If below function is placed in Excel standard module, function can be called in spreadsheet cell =FindCell(..., ###)
' MATCHES ROW AND COL INPUT FOR CELL ADDRESS OUTPUT
Function FindCell(item As String, year As Integer) As String
FindCell = Cells(Application.Match(item, Range("A1:A5"), 0), _
Application.Match(year, Range("A1:E1"), 0)).Address
End Function
debug.Print FindCell("COGS", 2014)
' $C$3
Python Script (foreign interface, requiring all objects to be declared)
Try/Except/Finally is used to properly close the Excel process regardless of script success or fail.
import win32com.client
# MATCHES ROW AND COL INPUT FOR CELL ADDRESS OUTPUT
def FindCell(item, year):
return(xlWks.Cells(xlApp.WorksheetFunction.Match(item, xlWks.Range("A1:A5"), 0),
xlApp.WorksheetFunction.Match(year, xlWks.Range("A1:E1"), 0)).Address)
try:
xlApp = win32com.client.Dispatch("Excel.Application")
xlWbk = xlApp.Workbooks.Open('C:/Path/To/Workbook.xlsx')
xlWks = xlWbk.Worksheets("SHEETNAME")
print(FindCell("COGS", 2014))
# $C$3
except Exception as e:
print(e)
finally:
xlWbk.Close(False)
xlApp.Quit
xlWks = None
xlWbk = None
xlApp = None

There are a surprising number of details you need to get right to manipulate Excel files this way with openpyxl. First, it's worth knowing that the xlsx file contains two representations of each cell - the formula, and the current value of the formula. openpyxl can return either, and if you want values you should specify data_only=True when you open the file. Also, openpyxl is not able to calculate a new value when you change the formula for a cell - only Excel itself can do that. So inserting a MATCH() worksheet function won't solve your problem.
The code below does what you want, mostly in Python. It uses the "A1" reference style, and does some calculations to turn column numbers into column letters. This won't hold up well if you go past column Z. In that case, you may want to switch to numbered references to rows and columns. There's some more info on that here and here. But hopefully this will get you on your way.
Note: This code assumes you are reading a workbook called 'test.xlsx', and that 'COGS' is in a list of items in 'Sheet1!A2:A5' and 2014 is in a list of years in 'Sheet1!B1:E1'.
import openpyxl
def get_xlsx_region(xlsx_file, sheet, region):
""" Return a rectangular region from the specified file.
The data are returned as a list of rows, where each row contains a list
of cell values"""
# 'data_only=True' tells openpyxl to return values instead of formulas
# 'read_only=True' makes openpyxl much faster (fast enough that it
# doesn't hurt to open the file once for each region).
wb = openpyxl.load_workbook(xlsx_file, data_only=True, read_only=True)
reg = wb[sheet][region]
return [[cell.value for cell in row] for row in reg]
# cache the lists of years and items
# get the first (only) row of the 'B1:F1' region
years = get_xlsx_region('test.xlsx', 'Sheet1', 'B1:E1')[0]
# get the first (only) column of the 'A2:A6' region
items = [r[0] for r in get_xlsx_region('test.xlsx', 'Sheet1', 'A2:A5')]
def find_correct_cell(year, item):
# find the indexes for 'COGS' and 2014
year_col = chr(ord('B') + years.index(year)) # only works in A:Z range
item_row = 2 + items.index(item)
cell_reference = year_col + str(item_row)
return cell_reference
print find_correct_cell(year=2014, item='COGS')
# C3

Related

In Python, is there a library to set/update the value of one Excel cell based on its name?

I have an Excel file with several named cells. I want to update these cells with a new value. Instead of using the coordinates: worksheet["D3"] = "New Excel Value". I would like to set it with its named value: worksheet["cell name"] = "New Excel Value".
What I have seen is that xlwings has a function to set the cell based on its named value. But xlwings requires Excel to be installed on the machine, which is not the case for our machine. Therefore I am looking at openpyxl. I have seen solutions that work such as 1 and 2. Both require an extra function with some manual steps to retrieve the single named cell. I expected to see the retrieval in the same way as with normal coordinates with square brackets. Which gives me the feeling that I missed something in the documentation.
Therefore I would like to know, is the best way to set the values of named cells the solution proposed in "Is there a way to save data in named Excel cells using Python?" or does openpyxl offer a function for it?
-- Edit --
The best that I could come up with is the code snippet below. I'll post it here, in case it would help anyone:
def _set_value_for_excel_named_cell(self, cell_name: str, value: Any) -> None:
"""Sets a value for the Excel cell based on its name."""
worksheet_title, cell_coordinates = list(
self._workbook.defined_names[cell_name].destinations
)[0]
self._workbook[worksheet_title][cell_coordinates] = value
You can create your own class and use the setitem and getitem methods to create that interface yourself by wrapping openpyxl. Using the resources you've already linked to, I've create a working example to help you get started.
from openpyxl import load_workbook
filename = "test.xlsx"
class XLWrap:
def __init__(self, wb):
self.wb = wb
def __getitem__(self, key):
# Returns the value for cells given set name
return [
self.wb[sheet][cell].value
for sheet, cell in list(self.wb.defined_names[key].destinations)
]
def __setitem__(self, key, dat):
# Sets the value for cells in workbook given set name
cells = self.wb.defined_names[key].destinations
for sheet, cell in cells:
ws = self.wb[sheet]
ws[cell] = dat
self.save()
def save(self):
self.wb.save(filename)
wb = load_workbook(filename)
xlw = XLWrap(wb)
print(xlw["test"])
xlw["test"] = "named_cell_val_after"
print(xlw["test"])
Output:
['named_cell_val_before']
['named_cell_val_after']
The above will not work for a range of cells. It is possible to create a wrapper to handle different types of output for named ranges which are noted in the docs.
they are very loosely defined. They might contain a constant, a formula, a single cell reference, a range of cells or multiple ranges of cells across different worksheets. Or all of the above.
So the relevant implementation will be reliant on how you're using them. For ranges of cells, I have this quick and dirty working example which then quickly falls apart if you have single cell definitions:
def __getitem__(self, key):
# Returns the value for cells given set name
sheets = [
self.wb[sht][rng]
for sht, rng in list(self.wb.defined_names[key].destinations)
]
values = []
for sheet in sheets:
for row in sheet:
for cell in row:
values.append(cell.value)
return values
def __setitem__(self, key, dat):
# Sets the value for cells in workbook given set name
sheets = [
self.wb[sht][rng]
for sht, rng in list(self.wb.defined_names[key].destinations)
]
for sheet in sheets:
for row in sheet:
for cell in row:
cell.value = dat
self.save()

Python and Excel Formula

Complete beginner here but have a specific need to try and make my life easier with automating Excel.
I have a weekly report that contains a lot of useless columns and using Python I can delete these and rename them, with the code below.
from openpyxl import Workbook, load_workbook
wb = load_workbook('TestExcel.xlsx')
ws = wb.active
ws.delete_cols(1,3)
ws.delete_cols(3,8)
ws.delete_cols(4,3)
ws.insert_cols(3,1)
ws['A1'].value = "Full Name"
ws['C1'].value = "Email Address"
ws['C2'].value = '=B2&"#testdomain.com"'
wb.save('TestExcelUpdated.xlsx')
This does the job but I would like the formula to continue from B2 downwards (since the top row are headings).
ws['C2'].value = '=B2&"#testdomain.com"'
Obviously, in Excel it is just a case of dragging the formula down to the end of the column but I'm at a loss to get this working in Python. I've seen similar questions asked but the answers are over my head.
Would really appreciate a dummies guide.
Example of Excel report after Python code
one way to do this is by iterating over the rows in your worksheet.
for row in ws.iter_rows(min_row=2): #min_row ensures you skip your header row
row[2].value = '=B' + str(row[0].row) + '&"#testdomain.com"'
row[2].value selects the third column due to zero based indexing. row[0].row gets the number corresponding to the current row

openpyxl: Append data to first empty column cell

Background:
I have an excel workbook containing metadata which spread across various worksheets. I need to take the relevant columns of data from the various worksheets and combine them into a single worksheet. With the following code I have been able to create a new worksheet and add data to it.
# Open workbook and assign worksheet
try:
wb = openpyxl.load_workbook(metadata)
shtEditionLNM = wb.worksheets[0] # Edition date & latest NM
shtChartsTitles = wb.worksheets[1] # Charts & Titles
shtDepthHeight = wb.worksheets[4] # Depth & heights
shtChartProj = wb.worksheets[7] # Chart Projection
except:
raise SystemExit(0)
new = wb.create_sheet()
new.title = "MT_CHARTS INFO"
new.sheet_properties.tabColor = "1072BA"
shtMeta = wb.get_sheet_by_name("MT_CHARTS INFO")
for row in shtChartsTitles.rows:
shtMeta.append([row[0].value, row[1].value, row[2].value, row[4].value])
for row in shtEditionLNM.rows:
shtMeta.append([row[3].value, row[4].value])
wb.save('OW - Quarterly Extract of Metadata for Raster Charts Dec 2015.xlsx')
This works without any errors and I can see the data saved to my new workbook. However when I run a second loop and append values they are appended to cell A3169 whereas I actually want them to populate from E1.
My question boils down to 'is there a way I can append to a new column instead of a new row?'
Thanks in advance!
Not directly: ws.append() works with rows because this is the way the data is stored and thus the easiest to optimise for the read-only and write-only modes.
However, ws.cell(row=x, column=y, value=z) will allow you to do want you want. Version 2.4 (install from a checkout) will also let you work directly with columns by managing the assignment to cells for you: ws['E'] will return a tuple of the cells in the column up to the current ws.max_row; ws.iter_cols(min_col, min_row, max_col, max_row) will return a generator of columns as big as you need it.
Thank you Charlie,
Your answer gave me the direction I needed to get this done. Referring to this question:
how to write to a new cell in python using openpyxl
i've found out there are many ways to skin this cat - the method below is what I went for in the end!
x=0
for row in shtEditionLNM.rows:
x+=1
shtMeta.cell(coordinate="E{}".format(x)).value = row[3].value
shtMeta.cell(coordinate="F{}".format(x)).value = row[4].value
I am new to openpyxl, but I believe we can convert a list to a list of tuple of each element, and then pass that object into the sheet.append() function:
L1=[a,b,c,d.....]
L2=[]
for a in L1:
L2.append(tuple(a))
for a in L2:
sheet.append(L2)
Please feel free to correct me.

xlwings function to find the last row with data

I am trying to find the last row in a column with data. to replace the vba function: LastRow = sht.Cells(sht.Rows.Count, "A").End(xlUp).Row
I am trying this, but this pulls in all rows in Excel. How can I just get the last row.
from xlwings import Workbook, Range
wb = Workbook()
print len(Range('A:A'))
Consolidating the answers above, you can do it in one line:
wb.sheet.range(column + last cell value).Get End of section going up[non blank assuming the last cell is blank].row
Example code:
import xlwings as xw
from xlwings import Range, constants
wb = xw.Book(r'path.xlsx')
wb.sheets[0].range('A' + str(wb.sheets[0].cells.last_cell.row)).end('up').row
We can use Range object to find the last row and/or the last column:
import xlwings as xw
# open raw data file
filename_read = 'data_raw.csv'
wb = xw.Book(filename_read)
sht = wb.sheets[0]
# find the numbers of columns and rows in the sheet
num_col = sht.range('A1').end('right').column
num_row = sht.range('A1').end('down').row
# collect data
content_list = sht.range((1,1),(num_row,num_col)).value
print(content_list)
This is very much the same as crazymachu's answer, just wrapped up in a function. Since version 0.9.0 of xlwings you can do this:
import xlwings as xw
def lastRow(idx, workbook, col=1):
""" Find the last row in the worksheet that contains data.
idx: Specifies the worksheet to select. Starts counting from zero.
workbook: Specifies the workbook
col: The column in which to look for the last cell containing data.
"""
ws = workbook.sheets[idx]
lwr_r_cell = ws.cells.last_cell # lower right cell
lwr_row = lwr_r_cell.row # row of the lower right cell
lwr_cell = ws.range((lwr_row, col)) # change to your specified column
if lwr_cell.value is None:
lwr_cell = lwr_cell.end('up') # go up untill you hit a non-empty cell
return lwr_cell.row
Intuitively, the function starts off by finding the most extreme lower-right cell in the workbook. It then moves across to your selected column and then up until it hits the first non-empty cell.
You could try using Direction by starting at the very bottom and then moving up:
import xlwings
from xlwings.constants import Direction
wb = xlwings.Workbook(r'data.xlsx')
print(wb.active_sheet.xl_sheet.Cells(65536, 1).End(Direction.xlUp).Row)
Try this:
import xlwings as xw
cellsDown = xw.Range('A1').vertical.value
cellsRight = xw.Range('A1').horizontal.value
print len(cellsDown)
print len(cellsRight)
One could use the VBA Find function that is exposed through api property (use it to find anything with a star, and begin your search from the first cell).
Example:
row_cell = s.api.Cells.Find(What="*",
After=s.api.Cells(1, 1),
LookAt=xlwings.constants.LookAt.xlPart,
LookIn=xlwings.constants.FindLookIn.xlFormulas,
SearchOrder=xlwings.constants.SearchOrder.xlByRows,
SearchDirection=xlwings.constants.SearchDirection.xlPrevious,
MatchCase=False)
column_cell = s.api.Cells.Find(What="*",
After=s.api.Cells(1, 1),
LookAt=xlwings.constants.LookAt.xlPart,
LookIn=xlwings.constants.FindLookIn.xlFormulas,
SearchOrder=xlwings.constants.SearchOrder.xlByColumns,
SearchDirection=xlwings.constants.SearchDirection.xlPrevious,
MatchCase=False)
print((row_cell.Row, column_cell.Column))
Other methods outlined here seems to require no empty rows/columns between data.
source: https://gist.github.com/Elijas/2430813d3ad71aebcc0c83dd1f130e33
python 3.6, xlwings 0.11
Solutoin 1
To find last row with data, you should do some work both horizontally and vertically. You have to go through every column to determine which row is the last row.
import xlwings
workbook_all = xlwings.Book(r'path.xlsx')
objectiveSheet = workbook_all .sheets['some_sheet']
# lastCellContainData(), inspired of Stefan's answer.
def lastCellContainData(objectiveSheet,lastRow=None,lastColumn=None):
lastRow = objectiveSheet.cells.last_cell.row if lastRow==None else lastRow
lastColumn = objectiveSheet.cells.last_cell.column if lastColumn==None else lastColumn
lastRows,lastColumns = [],[]
for col in range(1,lastColumn):
lastRows.append(objectiveSheet.range((lastRow, col)).end('up').row)
# extract last row of every column, then max(). Or you can compare the next
# column's last row number to the last column's last row number. Here you get
# the last row with data, you can also go further get the last column with data:
for row in range(1,lastRow):
lastColumns.append(objectiveSheet.range((row, lastColumn)).end('left').column)
return max(lastRows),max(lastColumns)
lastCellContainData(objectiveSheet,lastRow=5000,lastColumn=300)
I added lastRow and lastColumn. To make the program more effective, you can set these parameters according to the approximate shape of the data you're dealing with.
Solution 2
xlwings is honored for being wrapper of pywin32. I don't know if your situation allows for keyboard or mouse. If so, first you ctrl+tab switch to the workbook, then ctrl+a to select the region containing data, then you call workbook_all.selection.rows.count.
another way:
When you know where right bottom cell of your data locates faintly, say AAA10000, just call objectiveSheet.range('A1:'+'AAA10000').current_region.rows.count
Update:
After a while none of the solutions were really intuitive to me, so I decided to compile the following:
Code:
import xlwings as Objxlwings
import xlwings.constants
def Return_RangeLastCell(ObjWS):
return ObjWS.api.Cells.SpecialCells(xlwings.constants.CellType.xlCellTypeLastCell)
I tried to keep consistency with the way to call it from Excel to keep it simple
Then on my main code, I just call it like so:
ObjWS=Objxlwings.Book('Book1.xlsx').sheets["Sheet1"]
print(Return_RangeLastCell(ObjWS).Column)
Interesting solutions. But maybe like this:
print(sheet.used_range.last_cell.row)
#Cody's answer will help under normal circumstances, but if your sheet have hidden rows at bottom like links: example, it will give the wrong row number.
Lets say, if your row counts of data is 10, and row[5:11] are hidden, i.e. actually last_row will be 10.
[code a] below will give you answer 5, [code b] below will give you answer 10.
code a:
ws = wb.sheets[your_sheet_name]
last_row = ws.range('A' + str(ws.cells.last_cell.row)).end('up').row # return 5
code b:
ws = wb.sheets[your_sheet_name]
last_row_1 = ws.used_range.last_cell.row # return 10

How to wait until Excel calculates formulas before continuing with win32com

I have a win32com Python script that combines multiple Excel files into a spreadsheet and saves it as a PDF.
How it works now is that the output is almost all #NAME? because the file is output before the Excel file's contents are calculated (which may take up to a minute).
How I force the workbook to calculate the values and wait until its done before continuing?
excel = win32.Dispatch('Excel.Application')
# moving stuff to this spreadsheet
wb1 = excel.Workbooks.Open(filepath1)
ws1 = excel.ActiveSheet
# from this spreadsheet
wb2 = excel.Workbooks.Open(filepath2)
ws2 = excel.ActiveSheet
# supposedly this should do it, but I haven't seen results
ws1.EnableCalculation = True
ws2.EnableCalculation = True
ws1.Calculate
ws2.Calculate
wb1.Cells(2, 4).Value = wb2.Cells(1,1).Value # doing stuff with values
# right here I need it to wait and calculate everything
# so when I export it, I see the values, not the formula or "#NAME?"
wb1.Save()
ws1.ExportAsFixedFormat(0, r'C:\filename.pdf')
wb1.Close(True)
wb2.Close(True)
excel.Application(Quit)
One kind of silly thing I did which actually worked for the cells, but not for the graphs
was go through the entire sheet setting the values of the worksheet to themselves, so that
the formula is overwritten by the numerical value.
However, the graphs still weren't updated to their values.
range = ws1.UsedRange
num_rows = range.Row + range.Rows.Count - 1
num_cols = range.Column + range.Columns.Count - 1
for i in range(1, num_rows ):
for j in range(1, num_cols ):
ws1.Cells(i, j).Value = ws1.Cells(i, j).Value
You have to be careful to call functions using the () syntax:
ws1.Calculate()
ws2.Calculate()
This will calculate all cells of both worksheets. It appears that each function returns only after it has finished all computations, which is the effect you want, so unless I'm misunderstanding your question that fix should be sufficient.
From my comment above:
The problem ended up being something else, which I answered here: https://stackoverflow.com/a/25495515/2374028
If you use any sort of scripting language, including python's win32com, it doesn't automatically include add-ins, and my calculations used add-ins, so it was just skipping over them.

Categories