How to delete a whole row if all cells contain certain value? - python

I try to delete all rows from a Excel sheet, who satisfied this condition: all cells from a row must contain only the values "-" or "".
I use Python and openpyxl .
But my code, don't work well:
import openpyxl
wb1 = openpyxl.load_workbook(filename="testat_openpyxl.xlsx")
instr=""
for i in range(5, wb1["Centralizator"].max_row):
for j in range(7,49):
celval=wb1["Centralizator"].cell(row=i,column=j).value
ins='{}=="-" or {}=="" and '.format(celval)
instr=instr + ins
if instr[:-3]:
wb1["Centralizator"].delete_rows(i,1)
wb1.save('testat_openpyxl.xlsx')
My idea is to create a big "if" statement , to check all cells from a row:
if wb1["Centralizator"].cell(row=5,column=7).value=="-" or
wb1["Centralizator"].cell(row=5,column=7).value=="" and
wb1["Centralizator"].cell(row=5,column=8).value=="-" or
wb1["Centralizator"].cell(row=5,column=8).value==""
wb1["Centralizator"].cell(row=5,column=9).value=="-" or
wb1["Centralizator"].cell(row=5,column=9).value==""
.........loop until column 48............
wb1["Centralizator"].cell(row=5,column=48).value=="-" or
wb1["Centralizator"].cell(row=5,column=48).value==""
now jump to next row
if wb1["Centralizator"].cell(row=6,column=7).value=="-" or
wb1["Centralizator"].cell(row=6,column=7).value=="" and
wb1["Centralizator"].cell(row=6,column=8).value=="-" or
wb1["Centralizator"].cell(row=6,column=8).value==""
.........loop until column 48...........
wb1["Centralizator"].cell(row=6,column=48).value=="-" or
wb1["Centralizator"].cell(row=6,column=48).value==""
next row.......until max_row....

Related

Python/Openpyxl: Merge empty row cells delimited by string

I am trying to create a script using python and openpyxl to open up a given excel sheet and merge all cells in a given row together until the script finds a cell containing a string. The row placement is always the same, but the number of columns and the column placement of the strings is not so it needs to be dynamic. Once a new string is found, I want to continue to merge cells until the column that is right before the grand total. There are also cases where the cell doesn't need to be merged, because there is no empty cell in the data set to merge it with.
I found this answer here, which is doing a similar procedure except it is merging rows instead of columns. I was able to refactor part of this to create a list of the cells that have strings in my workbook, but am struggling on next steps. Any thoughts?
import openpyxl
from openpyxl.utils import get_column_letter
from openpyxl import Workbook
wb1 = openpyxl.load_workbook('stackoverflow question.xlsx')
ws1 = wb1.worksheets['ws1']
columns_with_strings = []
merge_row = '3' #the data to merge will always be in this row
for col in range (2, ws1.max_column-1):
for row in merge_row:
if ws1[get_column_letter(col) + merge_row].value != None:
columns_with_strings.append(str(get_column_letter(col) + merge_row)
The above code yields this list which includes the correct cells that contain strings and need to be checked for merging:
['C3', 'F3', 'J3']
This is how the workbook looks now:
This is how I am trying to get it to look in the end:
To complete your code, you can use worksheet.merge_cells with worhseet.cell.alignment:
from openpyxl import load_workbook
from openpyxl.styles import Alignment
wb = load_workbook("tmp/stackoverflow question.xlsx")
ws = wb["Sheet1"]
merge_row = 3
#here, we get the columns idx for every non null cell in row 3
#and after that, we make a text alignment (center) in the last cell
idx_col_strings = [cell.column for cell in ws[merge_row] if cell.value]
ws.cell(3, idx_col_strings[-1]).alignment = Alignment(horizontal="center")
#here, we loop through each range until the last non null cell in row 3
#then, we make a merge as much as the number of transitions (non null => null)
#and finally, we make a text alignement (center) for each cell/merge
for i in range(len(idx_col_strings)-1):
start_col, end_col = idx_col_strings[i], idx_col_strings[i+1]-1
ws.merge_cells(start_row=merge_row, start_column=start_col,
end_row=merge_row, end_column=end_col)
ws.cell(merge_row, start_col).alignment = Alignment(horizontal="center")
wb.save("tmp/stackoverflow answer.xlsx")
BEFORE :
AFTER :
To start, if you aren't familiar with openpyxl's merge and unmerge functions, I recommend your read about them in the documentation (https://openpyxl.readthedocs.io/en/stable/usage.html#merge-unmerge-cells) to get a sense of how this works.
Here is base code that should provide the functionality you are wanting, but some values may need tweaked for your device or spreadsheet.
import openpyxl # Necessary imports.
from openpyxl.utils import get_column_letter
from openpyxl.utils.cell import coordinate_from_string
from openpyxl.utils.cell import column_index_from_string
from openpyxl import Workbook
wb1 = openpyxl.load_workbook('stackoverflow question.xlsx') # Start of your code.
ws1 = wb1.worksheets[0]
columns_with_strings = []
merge_row = '3' #the data to merge will always be in this row
for col in range (2, ws1.max_column):
for row in merge_row:
if ws1[get_column_letter(col) + merge_row].value != None:
columns_with_strings.append(str(get_column_letter(col) + merge_row)) # End of your code.
prior_string = columns_with_strings[0] # Set the "prior_string" to be the first detected string.
for cell in columns_with_strings:
coords = coordinate_from_string(cell) # Split "prior_string" into the letter and number components.
if column_index_from_string(coords[0]) >1:
prior = str(get_column_letter(column_index_from_string(coords[0])-1)) + str(coords[1]) # Get the cell that is left of the cell "prior_string"
if prior > prior_string:
ws1.merge_cells(f'{prior_string}:{prior}') # Merge the cells.
prior_string=cell # Set the current string to be the prior string.
ws1.merge_cells(f'{cell}:{get_column_letter(ws1.max_column)+str(coords[1])}') # Merge the last string to the end (the last column).
wb1.save("stackoverflow question.xlsx") # Save the file changes.
I hope this helps to point you in the right direction!
Based on #timeless' answer I've cleaned the code up a bit to make better use of Python's tools and the openpyxl API
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
ws.append([])
ws.append([])
ws.append([None, None, "Group A", None, None, "Group B", None, None, None, "Group C"])
# get column indices for header cells
headings = [cell.column for cell in next(ws.iter_rows(min_row=3, max_row=3)) if cell.value]
from openpyxl.styles import Alignment, PatternFill, NamedStyle
fill = PatternFill(patternType="solid", fgColor="DDDDDD")
alignment = Alignment(horizontal="center")
header_style = NamedStyle(alignment=alignment, fill=fill, name="Header")
wb.named_styles.append(header_style)
from itertools import zip_longest
# create ranges for merged cells from the list of header cells: the boundary of the first range, is the index of the start of the next minus 1. Use zip_longest for the final value
for start_column, end_column in zip_longest(headings, headings[1:], fillvalue=headings[-1]+1):
ws.cell(3, start_column).style = header_style
ws.merge_cells(start_row=3, end_row=3, start_column=start_column, end_column=end_column-1)
wb.save("merged.xlsx")
Using the API wherever possible generally leads to more manageable and generic code.

Changing Existing Cell Value with Openpyxl

I am working on some code trying to change cells in an existing .xlsx file.
Therefore, I am looking for all empty "None" fields in column A and tryiing to copy a value.
Unfortunately, the value of the cell will not be overwritten by following lines.
Only if I put in clear numbers e.g. 18, I will get the right result, but wants to iterate.
import openpyxl
path= 'mypath/Python/Excel_test.xlsx'
workbook = openpyxl.load_workbook( path )
worksheet = workbook.get_sheet_by_name('Tabelle1')
i=1
for row in worksheet.iter_rows(values_only=True):
if row[0]!=None:
text=row[0]
print(row)
else:
worksheet.cell(row=i,column=1).value=text
print(row)
i=i+1
workbook.save('test.xlsx')
Thanks a lot
Dominik

Write an excel formula all column with python

I have existing excel document and want to update M column according to A column. And I want to start from second row to maintain first row 'header'.
Here is my code;
import openpyxl
wb = openpyxl.load_workbook('D:\Documents\Desktop\deneme/formula.xlsx')
ws=wb['Sheet1']
for i, cellObj in enumerate(ws['M'], 1):
cellObj.value = '=_xlfn.ISOWEEKNUM(A2)'.format(i)
wb.save('D:\Documents\Desktop\deneme/formula.xlsx')
When I run that code;
-first row 'header' changes.
-all columns in excel "ISOWEEKNUM(A2)", but I want it to change according to row number (A3,A4,A5... "ISOWEEKNUM(A3), ISOWEEKNUM(A4), ISOWEEKNUM(A5)....")
Edit:
I handled right now the ISOWEEKNUM issue with below code. I changed A2 to A2:A5.
import openpyxl
wb = openpyxl.load_workbook('D:\Documents\Desktop\deneme/formula.xlsx')
ws=wb['Sheet1']
for i, cellObj in enumerate(ws['M'], 1):
cellObj.value = '=_xlfn.ISOWEEKNUM(A2:A5)'.format(i)
wb.save('D:\Documents\Desktop\deneme/formula.xlsx')
But still starts from first row.
Here is an answer using pandas.
Let us consider the following spreadsheet:
First import pandas:
import pandas as pd
Then load the third sheet of your excel workbook into a dataframe called df:
df=pd.read_excel('D:\Documents\Desktop\deneme/formula.xlsx', sheet_name='Sheet3')
Update column 'column_to_update' using column 'deneme'. The line below converts the dates in the 'deneme' column from strings to datetime objects and then returns the week of the year associated with each of those dates.
df['Column_to_update'] = pd.to_datetime(df['deneme']).dt.week
You can then save your dataframe to a new excel document:
df.to_excel('./newspreadsheet.xlsx', index=False)
Here is the result:
You can see that the values in 'column_to_update' got updated from 1, 2 and 3 to 12, 12 and 18.

XLRD: Start reading a column from a specific cell / range (Python)

I am trying to read all values within the first sheet of an excel file via xlrd, but I need it to start reading values from row 3 of the excel sheet, until the end of values in the column
Current version reads all information within the columns including the headers, this is not desired
code:
for col in range(sheet.nrows):
names = sheet.cell(col,0)
nums = sheet.cell(col,1)
if names.value != xlrd.empty_cell.value:
if nums.value != xlrd.empty_cell.value:
f.write('\t\t\t\t\t\t\t\t\t'+ '<li><strong>' + names.value + '</strong> '+ repr(nums.value)+'</li>' + "\n")
Change your index in the code..... for col in range(2,sheet.nrows): should give the desired behaviour.
On a sidenote, you should really rename your variables, you're using col as a variable for the number of rows in a sheet (which causes all kinds of confusion).
EDIT to point out that XLREAD is 0 indexed.

How to detect merged cells in an Excel sheet?

I'm trying to read data from an Excel sheet that contains merged cells.
When reading merged cells with openpyxl the first merged cell contain the value and the rest of the cells are empty.
I would like to know about each cell if it is merged and how many cells are merged but I couldn't find any function that does so.
The sheet have empty others cells, so I can't use that.
You can use merged_cells.ranges (merged_cell_ranges has been deprecated in version 2.5.0-b1 (2017-10-19), changed to merged_cells.ranges) on the sheet (can't seem to find per row) like this:
from openpyxl import load_workbook
wb = load_workbook(filename='a file name')
sheet_ranges = wb['Sheet1']
print(sheet_ranges.merged_cells.ranges)
To test if a single cell is merged or not you can check the class (name):
cell = sheet.cell(row=15, column=14)
if type(cell).__name__ == 'MergedCell':
print("Oh no, the cell is merged!")
else:
print("This cell is not merged.")
To "unmerge" all cells you can use the function unmerge_cells
for items in sorted(sheet.merged_cell_ranges):
print(items)
sheet.unmerge_cells(str(items))
To test if a single cell is merged, I loop through sheet.merged_cells.ranges like #A. Lau suggests.
Unfortunately, checking the cell type like #0x4a6f4672 shows does not work any more.
Here is a function that shows you how to do this.
def testMerge(row, column):
cell = sheet.cell(row, column)
for mergedCell in sheet.merged_cells.ranges:
if (cell.coordinate in mergedCell):
return True
return False
The question asks about detecting merged cells and reading them, but so far the provided answers only deal with detecting and unmerging. Here is a function which returns the logical value of the cell, the value that the user would see as contained on a merged cell:
import sys
from openpyxl import load_workbook
from openpyxl.cell.cell import MergedCell
def cell_value(sheet, coord):
cell = sheet[coord]
if not isinstance(cell, MergedCell):
return cell.value
# "Oh no, the cell is merged!"
for range in sheet.merged_cells.ranges:
if coord in range:
return range.start_cell.value
raise AssertionError('Merged cell is not in any merge range!')
workbook = load_workbook(sys.argv[1])
print(cell_value(workbook.active, sys.argv[2]))
These all helped (thanks), but when I used the approaches with a couple of spreadsheets, it wasn't unmerging all the cells I expected. I had to loop and restest for merges to finally get them all to complete. In my case, it took 4 passes to get everything to unmerge as expected:
mergedRanges = sheet_ranges.merged_cells.ranges
### How many times do we run unmerge?
i=0
### keep testing and removing ranges until they are all actually gone
while mergedRanges:
for entry in mergedRanges:
i+=1
print(" unMerging: " + str(i) + ": " +str(entry))
ws.unmerge_cells(str(entry))

Categories