How to split merged Excel cells with Python? - python

I am trying to split only the merged cells in Excel file (with multiple sheets) that are like:
Please note that there are partially/fully empty rows. These rows are not merged.
Using openpyxl, I found the merged cell ranges in each sheet with this code:
wb2 = load_workbook('Example.xlsx')
sheets = wb2.sheetnames ##['Sheet1', 'Sheet2']
for i,sheet in enumerate(sheets):
ws = wb2[sheets[i]]
print(ws.merged_cell_ranges)
The print output:
['B3:B9', 'B13:B14', 'A3:A9', 'A13:A14', 'B20:B22', 'A20:A22']
['B5:B9', 'A12:A14', 'B12:B14', 'A17:A18', 'B17:B18', 'A27:A28', 'B27:B28', 'A20:A22', 'B20:B22', 'A3:A4', 'B3:B4', 'A5:A9']
Since I found the merged cell ranges, I need to split the ranges and fill in the corresponding rows like this:
How can I split like this using openpyxl? I am new to using this module. Any feedback is greatly appreciated!

You need to use the unmerge function. Example:
ws.unmerge_cells(start_row=2,start_column=1,end_row=2,end_column=4)

when you use unmerge_cells function, sheet.merged_cells.ranges will be modified, so don't use sheet.merged_cells.ranges in for loop.
from openpyxl.workbook import Workbook
from openpyxl import load_workbook
from openpyxl.utils.cell import range_boundaries
wb = load_workbook(filename = 'tmp.xlsx')
for st_name in wb.sheetnames:
st = wb[st_name]
mcr_coord_list = [mcr.coord for mcr in st.merged_cells.ranges]
for mcr in mcr_coord_list:
min_col, min_row, max_col, max_row = range_boundaries(mcr)
top_left_cell_value = st.cell(row=min_row, column=min_col).value
st.unmerge_cells(mcr)
for row in st.iter_rows(min_col=min_col, min_row=min_row, max_col=max_col, max_row=max_row):
for cell in row:
cell.value = top_left_cell_value
wb.save('merged_tmp.xlsx')

Related

Openpyxl to delete Table row in Excel

I'm having a bad time figuring out how to delete an entire empty row. When the row is part of an Excel Table.
So I tried with the following code. But it keeps the format of the table and it doesn't work ie for Functions like Count If, because it counts those blank rows.
from openpyxl import load_workbook as lw
wb = lw(file)
ws = wb['Sheet']
endrow = 10 #target row from which I will delete
#delete entire rows from endrow to the end of the sheet
for i in range(endrow, ws.max_row + 1):
wsRemesas.delete_rows(i)
I want those rows to be absolutely blank like the default file. Not part of a table or with format.
Regards.
I used this solution:
import xlwings as xw
from xlwings.constants import DeleteShiftDirection
app = xw.App(visible=False)
wb = app.books.open('PathtoFile')
sht = wb.sheets['SheetName']
endrow = XX #number of target row from you want to delete below
# Delete after endrow till row 10,000
sht.range(str(endrow)+':10000').api.Delete(DeleteShiftDirection.xlShiftUp)
wb.save()
app.kill()

How to read specif cell with pandas library?

I want to read from excel sheet a specific cell: h6. So I try it like this:
import pandas as pd
excel_file = './docs/fruit.xlsx'
df = pd.read_excel(excel_file,'Overzicht')
sheet = df.active
x1 = sheet['H6'].value
print(x1)
But then I get this error:
File "C:\Python310\lib\site-packages\pandas\core\generic.py", line 5575, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'active'
So my questiion is: How to read specif cell from sheet from excelsheet?
Thank you
Oke, I tried with openpyxl:
import openpyxl
path = "./docs/fruit.xlsx"
wb_obj = openpyxl.load_workbook(path)
sheet_obj = wb_obj.active
cell_obj = sheet_obj.cell(row = 6, column = 9)
print(cell_obj.value)
But then the formula is printed. Like this:
=(H6*1000)/F6/G6
and not the value: 93
You can do this using openpyxl directly or pandas (which internally uses openpyxl behind the scene)...
Using Openpyxl
You will need to use data_only=True when you open the file. Also, make sure you know the row and column number. To read the data in H6, row would be 6 and 8 would be H
import openpyxl
path = "./docs/Schoolfruit.xlsx"
wb_obj = openpyxl.load_workbook(path, data_only=True)
sheet_obj = wb_obj.active ## Or use sheet_obj = wb_obj['Sheet1'] if you know sheet name
val = sheet_obj.cell(row = 6, column = 8).value
print(val)
Using Pandas
The other option is to use pandas read_excel() which will read the whole sheet into a dataframe. You can use iloc() or at() to read the specific cell. Note that this is probably the less optimal solution if you need to read just one cell...
Another point to note here is that, once you have read the data into a dataframe, the row 1 will be considered as the header and the first row would now be 0. So the row number would be 4 instead of 6. Similarly, the first column would now be 0 and not 1, which would change the position to [4,7]
import pandas as pd
path = "./docs/Schoolfruit.xlsx"
df = pd.read_excel(path, 'Sheet1')
print(df.iloc[4,7])
I found a solution and hope, it works for you.
import pandas as pd
excel_file = './docs/Schoolfruit.xlsx'
df = pd.read_excel(excel_file, sheet_name='active' ,header=None, skiprows=1)
print(df[7][4])
7: Hth column
4: 6th row (skipped first row and index is began from 0)

Creating and naming column in openpyxl

Background-
The following code snippet will iterate over all worksheets in a workbook, and write a formula to every last column -
import openpyxl
filename = 'filename.xlsx'
filename_output = 'filename_output.xlsx'
wb = openpyxl.load_workbook(filename)
for sheet in wb.worksheets:
sheet.insert_cols(sheet.max_column)
for row in sheet.iter_rows():
row[-1].value = "=SUMIFS(J:J,M:M,#M:M)"
wb.save(filename_output)
Question -I cannot find documentation on how to name the column. Does anyone know how to achieve this?
Context -I want this column (in each worksheet) to be called 'Calculation'.
To get the last column, you can use sheet.max_column. Once you have updated the formulas, you can use sheet.cell(1,col).value = "Calc" to update the header. Updated code below...
import openpyxl
filename = 'filename.xlsx'
filename_output = 'filename_output.xlsx'
wb = openpyxl.load_workbook(filename)
for sheet in wb.worksheets:
sheet.insert_cols(sheet.max_column)
for row in sheet.iter_rows():
row[-1].value = "=SUMIFS(J:J,M:M,#M:M)"
sheet.cell(1,sheet.max_column).value = "Calculation" ## Add line inside FOR loop
wb.save(filename_output)
Output would look something like this.

Reading values from Excel workbook in every worksheet in particular column

I'd like to read the values from column B in every worksheet within my workbook.
After a fair amount of reading and playing around I can return the cell names of the cells I want the values from, but I can't figure out how to get the values.
from openpyxl import load_workbook
wb = load_workbook(r"C:/Users/username/Documents/test.xlsx")
for sheet in wb.worksheets:
for row in range(2,sheet.max_row+1):
for column in "B":
cell_name = "{}{}".format(column, row)
print (cell_name)
This is returning the cell names (i.e. B2, B3) that have values in column B in every worksheet.
According to the documentation https://openpyxl.readthedocs.io/en/stable/usage.html you can access cell values as:
sheet['B5'].value
Replace B5 with the cell(s) you need.
import xlrd
loc = ("foo.xlsx") # excel file name
wb = xlrd.open_workbook(loc)
# sheet = wb.sheet_by_index(0)
for sheet in wb.sheets():
for i in range(sheet.nrows):
print(sheet.cell_value(i, 1))
Edit: I edited my answer to read all sheets in excel file.
just play with the range
from openpyxl import load_workbook
wb = load_workbook('')
for sheet in wb:
for i in range(1,50):
if sheet['B'+str(i)].value:
print(sheet['B'+str(i)].value)
Better one,
from openpyxl import load_workbook
wb = load_workbook('')
for sheet in wb:
for row in sheet['B']:
print(row.value)

Writing multi-line strings into cells using openpyxl

I'm trying to write data into a cell, which has multiple line breaks (I believe \n), the resulting .xlsx has line breaks removed.
Is there a way to keep these line breaks?
The API for styles changed for openpyxl >= 2. The following code demonstrates the modern API.
from openpyxl import Workbook
from openpyxl.styles import Alignment
wb = Workbook()
ws = wb.active # wb.active returns a Worksheet object
ws['A1'] = "Line 1\nLine 2\nLine 3"
ws['A1'].alignment = Alignment(wrapText=True)
wb.save("wrap.xlsx")
Disclaimer: This won't work in recent versions of Openpyxl. See other answers.
In openpyxl you can set the wrap_text alignment property to wrap multi-line strings:
from openpyxl import Workbook
workbook = Workbook()
worksheet = workbook.worksheets[0]
worksheet.title = "Sheet1"
worksheet.cell('A1').style.alignment.wrap_text = True
worksheet.cell('A1').value = "Line 1\nLine 2\nLine 3"
workbook.save('wrap_text1.xlsx')
This is also possible with the XlsxWriter module.
Here is a small working example:
from xlsxwriter.workbook import Workbook
# Create an new Excel file and add a worksheet.
workbook = Workbook('wrap_text2.xlsx')
worksheet = workbook.add_worksheet()
# Widen the first column to make the text clearer.
worksheet.set_column('A:A', 20)
# Add a cell format with text wrap on.
cell_format = workbook.add_format({'text_wrap': True})
# Write a wrapped string to a cell.
worksheet.write('A1', "Line 1\nLine 2\nLine 3", cell_format)
workbook.close()
Just an additional option, you can use text blocking """ my cell info here """ along with the text wrap Boolean in alignment and get the desired result as well.
from openpyxl import Workbook
from openpyxl.styles import Alignment
wb= Workbook()
sheet= wb.active
sheet.title = "Sheet1"
sheet['A1'] = """Line 1
Line 2
Line 3"""
sheet['A1'].alignment = Alignment(wrapText=True)
wb.save('wrap_text1.xlsx')
Just in case anyone is looking for an example where we iterate over all cells to apply wrapping:
Small working example:
import pandas as pd
from openpyxl import Workbook
from openpyxl.styles import Alignment
from openpyxl.utils.dataframe import dataframe_to_rows
# create a toy dataframe. Our goal is to replace commas (',') with line breaks and have Excel rendering \n as line breaks.
df = pd.DataFrame(data=[["Mark", "Student,26 y.o"],
["Simon", "Student,31 y.o"]],
columns=['Name', 'Description'])
# replace comma "," with '\n' in all cells
df = df.applymap(lambda v: v.replace(',', '\n') if isinstance(v, str) else v)
# Create an empty openpyxl Workbook. We will populate it by iteratively adding the dataframe's rows.
wb = Workbook()
ws = wb.active # to get the actual Worksheet object
# dataframe_to_rows allows to iterate over a dataframe with an interface
# compatible with openpyxl. Each df row will be added to the worksheet.
for r in dataframe_to_rows(df3, index=True, header=True):
ws.append(r)
# iterate over each row and row's cells and apply text wrapping.
for row in ws:
for cell in row:
cell.alignment = Alignment(wrapText=True)
# export the workbook as an excel file.
wb.save("wrap.xlsx")

Categories