I am working on some code trying to change cells in an existing .xlsx file.
Therefore, I am looking for all empty "None" fields in column A and tryiing to copy a value.
Unfortunately, the value of the cell will not be overwritten by following lines.
Only if I put in clear numbers e.g. 18, I will get the right result, but wants to iterate.
import openpyxl
path= 'mypath/Python/Excel_test.xlsx'
workbook = openpyxl.load_workbook( path )
worksheet = workbook.get_sheet_by_name('Tabelle1')
i=1
for row in worksheet.iter_rows(values_only=True):
if row[0]!=None:
text=row[0]
print(row)
else:
worksheet.cell(row=i,column=1).value=text
print(row)
i=i+1
workbook.save('test.xlsx')
Thanks a lot
Dominik
Related
I'm working on an API that I was given and am having trouble with printing to the correct row in an Excel file. It is to check the first row that has any open cells in the table and then print to it. Sometimes the first available row is listed as a random row where the row is already full and is overwriting the data or completely skipping rows with only 11/30 cells with data. I have two programs doing this and having the same issues.
Here is the row selection and printing portion of the code.
wb = openpyxl.load_workbook(path)
ws = wb.active
# This loop will go over the excel rows to find the first empty row. It increments the variable #"firstEmptyRow" until it finds the first empty row.
firstEmptyRow = 0
for row in ws:
if not any([cell.value == None for cell in row]):
firstEmptyRow += 1
print(firstEmptyRow)
# The following lines will post the dataframes in the excel path.
with pd.ExcelWriter(path, mode="a",engine = "openpyxl", if_sheet_exists="overlay" ) as writer:
allDf.to_excel(writer, sheet_name= "Main Table", index = False, header = False,startrow=firstEmptyRow,startcol=0)
with pd.ExcelWriter(path, mode="a",engine = "openpyxl", if_sheet_exists="overlay" ) as writer:
combinedArray.to_excel(writer, sheet_name= "Main Table", index = False, header = False,startrow=firstEmptyRow,startcol=18)
Thank you for any help you have. :) Let me kn0ow if you have any questions.
To fix this I've made sure locate if there was data in each row, there is. Beyond this, I have deleted any data that may have been in any cells outside of the table. Neither helped.
In my project I am opening an Excel file with multiple sheets. I want to manipulate "sheet2" in Python (which works fine) and after that overwrite the old "sheet2" with the new one but KEEP the formatting.. so something like this:
import pandas as pd
update_sheet2 = pd.read_excel(newest_isaac_file, sheet_name='sheet2')
#do stuff with the sheet
with pd.ExcelWriter(filepath, engine='openpyxl', if_sheet_exists='replace', mode='a',
KEEP_FORMATTING = True) as writer:
df.to_excel(writer, sheet_name=sheetname, index=index)
In other words: Is there a way to get the formatting from an existing Excel sheet?
I could not find anything about that. I know I can manually set the formatting in Python but the formatting of the existing sheet is really complicated and has to stay the same.
thanks for your help!
As per your comment, try this code. It will open a file (Sample.xlsx), go to a sheet (Sheet1), insert new row at 15, copy the text and formatting from row 22 and paste it in the empty row (row 15). Code and final screen shot attached.
import openpyxl
from copy import copy
wb=openpyxl.load_workbook('Sample.xlsx') #Load workbook
ws=wb['Sheet1'] #Open sheet
ws.insert_rows(15, 1) #Insert one row at 15 and move everything one row downwards
for row in ws.iter_rows(min_row=22, max_row=22, min_col=1, max_col=ws.max_column): # Read values from row 22
for cell in row:
ws.cell(row=15, column=cell.column).value = cell.value #Update value to row 22 to new row 15
ws.cell(row=15, column=cell.column)._style = copy(cell._style) #Copy formatting
wb.save('Sample.xlsx')
How excel looks after running the code
I am trying to insert values from a list into excel, I know that I can use a dictionary and will do the same, but I would like to do it this way from a list. The code appends the value but appends only one value. For instance, in the column appears the value of Salsa. Thank you in advance!
import openpyxl
wb = openpyxl.load_workbook("Python_Example.xlsx")
list_of_music=list(sheet.columns)[4] #With this I can loop over the column number 4 cells
favorite_music= ['Rock','Bachata','Salsa']
for cellObj in list_of_music:
for item in favorite_music:
cellObj.value = str(item)
wb.save("Python_Example.xlsx")
Check the openpyxl docs; they include some good basic tutorials that will help you, especially for iterating over ranges of cells. iter_rows and iter_cols are also very useful tools that may help you here. A simple solution would consist of:
import openpyxl as op
# Create example workbook
wb = op.Workbook()
ws = wb.active
favourite_music = ['Rock','Bachata','Salsa']
for i, music in enumerate(favourite_music):
ws.cell(row=i+1, column=4).value = music
wb.save('Example.xlsx')
I have a somewhat large .xlsx file - 19 columns, 5185 rows. I want to open the file, read all the values in one column, do some stuff to those values, and then create a new column in the same workbook and write out the modified values. Thus, I need to be able to both read and write in the same file.
My original code did this:
def readExcel(doc):
wb = load_workbook(generalpath + exppath + doc)
ws = wb["Sheet1"]
# iterate through the columns to find the correct one
for col in ws.iter_cols(min_row=1, max_row=1):
for mycell in col:
if mycell.value == "PerceivedSound.RESP":
origCol = mycell.column
# get the column letter for the first empty column to output the new values
newCol = utils.get_column_letter(ws.max_column+1)
# iterate through the rows to get the value from the original column,
# do something to that value, and output it in the new column
for myrow in range(2, ws.max_row+1):
myrow = str(myrow)
# do some stuff to make the new value
cleanedResp = doStuff(ws[origCol + myrow].value)
ws[newCol + myrow] = cleanedResp
wb.save(doc)
However, python threw a memory error after row 3853 because the workbook was too big. The openpyxl docs said to use Read-only mode (https://openpyxl.readthedocs.io/en/latest/optimized.html) to handle big workbooks. I'm now trying to use that; however, there seems to be no way to iterate through the columns when I add the read_only = True param:
def readExcel(doc):
wb = load_workbook(generalpath + exppath + doc, read_only=True)
ws = wb["Sheet1"]
for col in ws.iter_cols(min_row=1, max_row=1):
#etc.
python throws this error:
AttributeError: 'ReadOnlyWorksheet' object has no attribute 'iter_cols'
If I change the final line in the above snippet to:
for col in ws.columns:
python throws the same error:
AttributeError: 'ReadOnlyWorksheet' object has no attribute 'columns'
Iterating over rows is fine (and is included in the documentation I linked above):
for col in ws.rows:
(no error)
This question asks about the AttritubeError but the solution is to remove Read-only mode, which doesn't work for me because openpyxl won't read my entire workbook in not Read-only mode.
So: how do I iterate through columns in a large workbook?
And I haven't yet encountered this, but I will once I can iterate through the columns: how do I both read and write the same workbook, if said workbook is large?
Thanks!
If the worksheet has only around 100,000 cells then you shouldn't have any memory problems. You should probably investigate this further.
iter_cols() is not available in read-only mode because it requires constant and very inefficient reparsing of the underlying XML file. It is however, relatively easy to convert rows into columns from iter_rows() using zip.
def _iter_cols(self, min_col=None, max_col=None, min_row=None,
max_row=None, values_only=False):
yield from zip(*self.iter_rows(
min_row=min_row, max_row=max_row,
min_col=min_col, max_col=max_col, values_only=values_only))
import types
for sheet in workbook:
sheet.iter_cols = types.MethodType(_iter_cols, sheet)
According to the documentation, ReadOnly mode only supports row-based reads (column reads are not implemented). But that's not hard to solve:
wb2 = Workbook(write_only=True)
ws2 = wb2.create_sheet()
# find what column I need
colcounter = 0
for row in ws.rows:
for cell in row:
if cell.value == "PerceivedSound.RESP":
break
colcounter += 1
# cells are apparently linked to the parent workbook meta
# this will retain only values; you'll need custom
# row constructor if you want to retain more
row2 = [cell.value for cell in row]
ws2.append(row2) # preserve the first row in the new file
break # stop after first row
for row in ws.rows:
row2 = [cell.value for cell in row]
row2.append(doStuff(row2[colcounter]))
ws2.append(row2) # write a new row to the new wb
wb2.save('newfile.xlsx')
wb.close()
wb2.close()
# copy `newfile.xlsx` to `generalpath + exppath + doc`
# Either using os.system,subprocess.popen, or shutil.copy2()
You will not be able to write to the same workbook, but as shown above you can open a new workbook (in writeonly mode), write to it, and overwrite the old file using OS copy.
I'm trying to read data from an Excel sheet that contains merged cells.
When reading merged cells with openpyxl the first merged cell contain the value and the rest of the cells are empty.
I would like to know about each cell if it is merged and how many cells are merged but I couldn't find any function that does so.
The sheet have empty others cells, so I can't use that.
You can use merged_cells.ranges (merged_cell_ranges has been deprecated in version 2.5.0-b1 (2017-10-19), changed to merged_cells.ranges) on the sheet (can't seem to find per row) like this:
from openpyxl import load_workbook
wb = load_workbook(filename='a file name')
sheet_ranges = wb['Sheet1']
print(sheet_ranges.merged_cells.ranges)
To test if a single cell is merged or not you can check the class (name):
cell = sheet.cell(row=15, column=14)
if type(cell).__name__ == 'MergedCell':
print("Oh no, the cell is merged!")
else:
print("This cell is not merged.")
To "unmerge" all cells you can use the function unmerge_cells
for items in sorted(sheet.merged_cell_ranges):
print(items)
sheet.unmerge_cells(str(items))
To test if a single cell is merged, I loop through sheet.merged_cells.ranges like #A. Lau suggests.
Unfortunately, checking the cell type like #0x4a6f4672 shows does not work any more.
Here is a function that shows you how to do this.
def testMerge(row, column):
cell = sheet.cell(row, column)
for mergedCell in sheet.merged_cells.ranges:
if (cell.coordinate in mergedCell):
return True
return False
The question asks about detecting merged cells and reading them, but so far the provided answers only deal with detecting and unmerging. Here is a function which returns the logical value of the cell, the value that the user would see as contained on a merged cell:
import sys
from openpyxl import load_workbook
from openpyxl.cell.cell import MergedCell
def cell_value(sheet, coord):
cell = sheet[coord]
if not isinstance(cell, MergedCell):
return cell.value
# "Oh no, the cell is merged!"
for range in sheet.merged_cells.ranges:
if coord in range:
return range.start_cell.value
raise AssertionError('Merged cell is not in any merge range!')
workbook = load_workbook(sys.argv[1])
print(cell_value(workbook.active, sys.argv[2]))
These all helped (thanks), but when I used the approaches with a couple of spreadsheets, it wasn't unmerging all the cells I expected. I had to loop and restest for merges to finally get them all to complete. In my case, it took 4 passes to get everything to unmerge as expected:
mergedRanges = sheet_ranges.merged_cells.ranges
### How many times do we run unmerge?
i=0
### keep testing and removing ranges until they are all actually gone
while mergedRanges:
for entry in mergedRanges:
i+=1
print(" unMerging: " + str(i) + ": " +str(entry))
ws.unmerge_cells(str(entry))