Background:
I have an excel workbook containing metadata which spread across various worksheets. I need to take the relevant columns of data from the various worksheets and combine them into a single worksheet. With the following code I have been able to create a new worksheet and add data to it.
# Open workbook and assign worksheet
try:
wb = openpyxl.load_workbook(metadata)
shtEditionLNM = wb.worksheets[0] # Edition date & latest NM
shtChartsTitles = wb.worksheets[1] # Charts & Titles
shtDepthHeight = wb.worksheets[4] # Depth & heights
shtChartProj = wb.worksheets[7] # Chart Projection
except:
raise SystemExit(0)
new = wb.create_sheet()
new.title = "MT_CHARTS INFO"
new.sheet_properties.tabColor = "1072BA"
shtMeta = wb.get_sheet_by_name("MT_CHARTS INFO")
for row in shtChartsTitles.rows:
shtMeta.append([row[0].value, row[1].value, row[2].value, row[4].value])
for row in shtEditionLNM.rows:
shtMeta.append([row[3].value, row[4].value])
wb.save('OW - Quarterly Extract of Metadata for Raster Charts Dec 2015.xlsx')
This works without any errors and I can see the data saved to my new workbook. However when I run a second loop and append values they are appended to cell A3169 whereas I actually want them to populate from E1.
My question boils down to 'is there a way I can append to a new column instead of a new row?'
Thanks in advance!
Not directly: ws.append() works with rows because this is the way the data is stored and thus the easiest to optimise for the read-only and write-only modes.
However, ws.cell(row=x, column=y, value=z) will allow you to do want you want. Version 2.4 (install from a checkout) will also let you work directly with columns by managing the assignment to cells for you: ws['E'] will return a tuple of the cells in the column up to the current ws.max_row; ws.iter_cols(min_col, min_row, max_col, max_row) will return a generator of columns as big as you need it.
Thank you Charlie,
Your answer gave me the direction I needed to get this done. Referring to this question:
how to write to a new cell in python using openpyxl
i've found out there are many ways to skin this cat - the method below is what I went for in the end!
x=0
for row in shtEditionLNM.rows:
x+=1
shtMeta.cell(coordinate="E{}".format(x)).value = row[3].value
shtMeta.cell(coordinate="F{}".format(x)).value = row[4].value
I am new to openpyxl, but I believe we can convert a list to a list of tuple of each element, and then pass that object into the sheet.append() function:
L1=[a,b,c,d.....]
L2=[]
for a in L1:
L2.append(tuple(a))
for a in L2:
sheet.append(L2)
Please feel free to correct me.
Related
I am trying to insert values from a list into excel, I know that I can use a dictionary and will do the same, but I would like to do it this way from a list. The code appends the value but appends only one value. For instance, in the column appears the value of Salsa. Thank you in advance!
import openpyxl
wb = openpyxl.load_workbook("Python_Example.xlsx")
list_of_music=list(sheet.columns)[4] #With this I can loop over the column number 4 cells
favorite_music= ['Rock','Bachata','Salsa']
for cellObj in list_of_music:
for item in favorite_music:
cellObj.value = str(item)
wb.save("Python_Example.xlsx")
Check the openpyxl docs; they include some good basic tutorials that will help you, especially for iterating over ranges of cells. iter_rows and iter_cols are also very useful tools that may help you here. A simple solution would consist of:
import openpyxl as op
# Create example workbook
wb = op.Workbook()
ws = wb.active
favourite_music = ['Rock','Bachata','Salsa']
for i, music in enumerate(favourite_music):
ws.cell(row=i+1, column=4).value = music
wb.save('Example.xlsx')
Complete beginner here but have a specific need to try and make my life easier with automating Excel.
I have a weekly report that contains a lot of useless columns and using Python I can delete these and rename them, with the code below.
from openpyxl import Workbook, load_workbook
wb = load_workbook('TestExcel.xlsx')
ws = wb.active
ws.delete_cols(1,3)
ws.delete_cols(3,8)
ws.delete_cols(4,3)
ws.insert_cols(3,1)
ws['A1'].value = "Full Name"
ws['C1'].value = "Email Address"
ws['C2'].value = '=B2&"#testdomain.com"'
wb.save('TestExcelUpdated.xlsx')
This does the job but I would like the formula to continue from B2 downwards (since the top row are headings).
ws['C2'].value = '=B2&"#testdomain.com"'
Obviously, in Excel it is just a case of dragging the formula down to the end of the column but I'm at a loss to get this working in Python. I've seen similar questions asked but the answers are over my head.
Would really appreciate a dummies guide.
Example of Excel report after Python code
one way to do this is by iterating over the rows in your worksheet.
for row in ws.iter_rows(min_row=2): #min_row ensures you skip your header row
row[2].value = '=B' + str(row[0].row) + '&"#testdomain.com"'
row[2].value selects the third column due to zero based indexing. row[0].row gets the number corresponding to the current row
So I've been trying to create a data store using google sheets (It's easier for me to navigate). The way I'm trying to do this is by creating a new worksheet with the user's ID and putting the information I have saved in a separate worksheet called 'template' into the new worksheet.
This is my current code:
newsheet = sh.add_worksheet(title = f"{author}", rows = "152", cols = "2")
for index in range(1, len(savefiletemplate.col_values(1))):
newsheet.update_cell(index, 1, savefiletemplate.cell(index, 1).value)
author is the user's ID, sh is my spreadsheet and savefiletemplate is my template worksheet
It gives me a very long error that I don't understand after copying 90-120 cells. I was wondering if this is my fault or my IDE's fault and if anyone knows how to fix it
I would try something like this...
rows = 152
cols = 2
template_data = savefiletemplate.get_all_values() # this will be a list of lists
newsheet = sh.add_worksheet(title=author, rows=rows, cols=cols)
a1_range = f"A1:{chr(ord('#')+cols)}{rows}" # A1:B152
newsheet.update(a1_range, template_data)
This basically takes your template and puts it into a list of lists. Then uses the rows and cols value to build the proper a1 notation for the range. Then updates the range in the new sheet with the template_data. Give it a shot and see if it does what you're looking for.
I'm trying to read data from an Excel sheet that contains merged cells.
When reading merged cells with openpyxl the first merged cell contain the value and the rest of the cells are empty.
I would like to know about each cell if it is merged and how many cells are merged but I couldn't find any function that does so.
The sheet have empty others cells, so I can't use that.
You can use merged_cells.ranges (merged_cell_ranges has been deprecated in version 2.5.0-b1 (2017-10-19), changed to merged_cells.ranges) on the sheet (can't seem to find per row) like this:
from openpyxl import load_workbook
wb = load_workbook(filename='a file name')
sheet_ranges = wb['Sheet1']
print(sheet_ranges.merged_cells.ranges)
To test if a single cell is merged or not you can check the class (name):
cell = sheet.cell(row=15, column=14)
if type(cell).__name__ == 'MergedCell':
print("Oh no, the cell is merged!")
else:
print("This cell is not merged.")
To "unmerge" all cells you can use the function unmerge_cells
for items in sorted(sheet.merged_cell_ranges):
print(items)
sheet.unmerge_cells(str(items))
To test if a single cell is merged, I loop through sheet.merged_cells.ranges like #A. Lau suggests.
Unfortunately, checking the cell type like #0x4a6f4672 shows does not work any more.
Here is a function that shows you how to do this.
def testMerge(row, column):
cell = sheet.cell(row, column)
for mergedCell in sheet.merged_cells.ranges:
if (cell.coordinate in mergedCell):
return True
return False
The question asks about detecting merged cells and reading them, but so far the provided answers only deal with detecting and unmerging. Here is a function which returns the logical value of the cell, the value that the user would see as contained on a merged cell:
import sys
from openpyxl import load_workbook
from openpyxl.cell.cell import MergedCell
def cell_value(sheet, coord):
cell = sheet[coord]
if not isinstance(cell, MergedCell):
return cell.value
# "Oh no, the cell is merged!"
for range in sheet.merged_cells.ranges:
if coord in range:
return range.start_cell.value
raise AssertionError('Merged cell is not in any merge range!')
workbook = load_workbook(sys.argv[1])
print(cell_value(workbook.active, sys.argv[2]))
These all helped (thanks), but when I used the approaches with a couple of spreadsheets, it wasn't unmerging all the cells I expected. I had to loop and restest for merges to finally get them all to complete. In my case, it took 4 passes to get everything to unmerge as expected:
mergedRanges = sheet_ranges.merged_cells.ranges
### How many times do we run unmerge?
i=0
### keep testing and removing ranges until they are all actually gone
while mergedRanges:
for entry in mergedRanges:
i+=1
print(" unMerging: " + str(i) + ": " +str(entry))
ws.unmerge_cells(str(entry))
Here is the Excel file in question:
Context: I am writing a program which can pull values from a PDF and put them in the appropriate cell in an Excel file.
Question: I want to write a function which takes a column value (e.g. 2014) and a row value (e.g. 'COGS') as arguments and return the cell reference where those two intersect (e.g. 'C3' for 2014 COGS).
def find_correct_cell(year=2014, item='COGS'):
#do something similar to what the =match function in Excel does
return cell_reference #returns 'C3'
I have already tried using openpyxl like this to change the values of some random empty cells where I can store these values:
col_num = '=match(2014, A1:E1)'
row_num = '=match("COGS", A1:A5)'
But I want to grab those values without having to arbitrarily write to those random empty cells. Plus, even with this method, when I read those cells (F5 and F6) it reads the formulae in those cells and not the face value of 3.
Any help is appreciated, thanks.
Consider a translated VBA solution as the Match function can adequately handle your needs. Python can access the Excel VBA Object Library using a COM interface with the win32com module. Please note this solution assumes you are using Excel for PC. Below includes the counterpart VBA function.
VBA Function (native interface)
If below function is placed in Excel standard module, function can be called in spreadsheet cell =FindCell(..., ###)
' MATCHES ROW AND COL INPUT FOR CELL ADDRESS OUTPUT
Function FindCell(item As String, year As Integer) As String
FindCell = Cells(Application.Match(item, Range("A1:A5"), 0), _
Application.Match(year, Range("A1:E1"), 0)).Address
End Function
debug.Print FindCell("COGS", 2014)
' $C$3
Python Script (foreign interface, requiring all objects to be declared)
Try/Except/Finally is used to properly close the Excel process regardless of script success or fail.
import win32com.client
# MATCHES ROW AND COL INPUT FOR CELL ADDRESS OUTPUT
def FindCell(item, year):
return(xlWks.Cells(xlApp.WorksheetFunction.Match(item, xlWks.Range("A1:A5"), 0),
xlApp.WorksheetFunction.Match(year, xlWks.Range("A1:E1"), 0)).Address)
try:
xlApp = win32com.client.Dispatch("Excel.Application")
xlWbk = xlApp.Workbooks.Open('C:/Path/To/Workbook.xlsx')
xlWks = xlWbk.Worksheets("SHEETNAME")
print(FindCell("COGS", 2014))
# $C$3
except Exception as e:
print(e)
finally:
xlWbk.Close(False)
xlApp.Quit
xlWks = None
xlWbk = None
xlApp = None
There are a surprising number of details you need to get right to manipulate Excel files this way with openpyxl. First, it's worth knowing that the xlsx file contains two representations of each cell - the formula, and the current value of the formula. openpyxl can return either, and if you want values you should specify data_only=True when you open the file. Also, openpyxl is not able to calculate a new value when you change the formula for a cell - only Excel itself can do that. So inserting a MATCH() worksheet function won't solve your problem.
The code below does what you want, mostly in Python. It uses the "A1" reference style, and does some calculations to turn column numbers into column letters. This won't hold up well if you go past column Z. In that case, you may want to switch to numbered references to rows and columns. There's some more info on that here and here. But hopefully this will get you on your way.
Note: This code assumes you are reading a workbook called 'test.xlsx', and that 'COGS' is in a list of items in 'Sheet1!A2:A5' and 2014 is in a list of years in 'Sheet1!B1:E1'.
import openpyxl
def get_xlsx_region(xlsx_file, sheet, region):
""" Return a rectangular region from the specified file.
The data are returned as a list of rows, where each row contains a list
of cell values"""
# 'data_only=True' tells openpyxl to return values instead of formulas
# 'read_only=True' makes openpyxl much faster (fast enough that it
# doesn't hurt to open the file once for each region).
wb = openpyxl.load_workbook(xlsx_file, data_only=True, read_only=True)
reg = wb[sheet][region]
return [[cell.value for cell in row] for row in reg]
# cache the lists of years and items
# get the first (only) row of the 'B1:F1' region
years = get_xlsx_region('test.xlsx', 'Sheet1', 'B1:E1')[0]
# get the first (only) column of the 'A2:A6' region
items = [r[0] for r in get_xlsx_region('test.xlsx', 'Sheet1', 'A2:A5')]
def find_correct_cell(year, item):
# find the indexes for 'COGS' and 2014
year_col = chr(ord('B') + years.index(year)) # only works in A:Z range
item_row = 2 + items.index(item)
cell_reference = year_col + str(item_row)
return cell_reference
print find_correct_cell(year=2014, item='COGS')
# C3