Create table in excel - Python openpyxl - python

Is there a way to create a table based on all rows and columns which cointain data? Normally a table is created by putting in a fixed reference (eg: ref="A1:E5")) What I need is the script to find the last row on the sheet and use this as the reference. This is because the sheet I need to edit contain different amount of rows each time and if I set a fixed reference it will include empty rows in the table.
If have this as a macro in excel but want to convert this to python with openpyxl
Excel Macro
Sub A2_SelectAllMakeTable()
lrow = ActiveSheet.Cells(Rows.Count, "A").End(xlUp).Row
lCol = ActiveSheet.Cells(1, Columns.Count).End(xlToLeft).Column
ActiveSheet.ListObjects.Add(xlSrcRange, Range(Cells(1, 1), Cells(lrow, lCol)), , xlYes).Name = "Masterdata"
End Sub
python code start:
from openpyxl import load_workbook
wb = load_workbook('export1.XLSX')
ws1 = wb["Sheet1"]
ws1.title = "Masterdata"

You can use get_column_letter, max_column, and max_row on your worksheet sheet, like so:
from openpyxl.worksheet.table import Table
from openpyxl.utils import get_column_letter
table = Table(displayName="Table1", ref="A1:" + get_column_letter(sheet.max_column) + str(sheet.max_row))
sheet.add_table(table)

You can create table like this -
tab = Table(displayName="Table1", ref="A1:E5")
# Add a default style with striped rows and banded columns
style = TableStyleInfo(name="TableStyleMedium9", showFirstColumn=False,
showLastColumn=False, showRowStripes=True, showColumnStripes=True)
tab.tableStyleInfo = style
ws1.add_table(tab)
Example from official Doc - https://openpyxl.readthedocs.io/en/stable/worksheet_tables.html

Related

Format and manipulate data across multiple Excel sheets in Python using openpyxl before converting to Dataframe

I need some help with editing the sheets within my Excel workbook in python, before I stack the data using pd.concat(). Each sheet (~100) within my Excel workbook is structured identically, with the unique identifier for each sheet being a 6-digit code that is found in line 1 of the worksheet.
I've already done the following steps to import the file, unmerge rows 1-4, and insert a new column 'C':
import openpyxl
import pandas as pd
wb = openpyxl.load_workbook('data_sheets.xlsx')
for sheet in wb.worksheets:
sheet.merged_cells
for merge in list(sheet.merged_cells):
sheet.unmerge_cells(range_string=str(merge))
sheet.insert_cols(3, 1)
print(sheet)
wb.save('workbook_test.xlsx')
#concat once worksheets have been edited
df= pd.concat(pd.read_excel('workbook_test.xlsx, sheet_name= None), ignore_index= True)
Before stacking the data however, I would like to make the following additonal (sequential) changes to every sheet:
Extract from row 1 the right 8 characters (in excel the equivalent of this would be =RIGHT(A1, 8) - this is to pull the unique code off of each sheet, which will look like '(000000)'.
Populate column C from rows 6-282 with the unique code.
Delete rows 1-5
The end result would make each sheet within the workbook look like this:
Is this possible to do with openpyxl, and if so, how? Any direction or assistance with this would be much appreciated - thank you!
Here is a 100% openpyxl approach to achieve what you're looking for :
from openpyxl import load_workbook
wb = load_workbook("workbook_test.xlsx")
for ws in wb:
ws.unmerge_cells("A1:O1") #unmerge first row till O
ws_uid = ws.cell(row=1, column=1).value[-8:] #get the sheet's UID
for num_row in range(6, 282):
ws.cell(row=num_row, column=3).value = '="{}"'.format(ws_uid) #write UID in Column C
ws.delete_rows(1, 5) #delete first 5 rows
wb.save("workbook_test.xlsx")
NB : This assume there is already an empty column (C).

Openpyxl to delete Table row in Excel

I'm having a bad time figuring out how to delete an entire empty row. When the row is part of an Excel Table.
So I tried with the following code. But it keeps the format of the table and it doesn't work ie for Functions like Count If, because it counts those blank rows.
from openpyxl import load_workbook as lw
wb = lw(file)
ws = wb['Sheet']
endrow = 10 #target row from which I will delete
#delete entire rows from endrow to the end of the sheet
for i in range(endrow, ws.max_row + 1):
wsRemesas.delete_rows(i)
I want those rows to be absolutely blank like the default file. Not part of a table or with format.
Regards.
I used this solution:
import xlwings as xw
from xlwings.constants import DeleteShiftDirection
app = xw.App(visible=False)
wb = app.books.open('PathtoFile')
sht = wb.sheets['SheetName']
endrow = XX #number of target row from you want to delete below
# Delete after endrow till row 10,000
sht.range(str(endrow)+':10000').api.Delete(DeleteShiftDirection.xlShiftUp)
wb.save()
app.kill()

Creating and naming column in openpyxl

Background-
The following code snippet will iterate over all worksheets in a workbook, and write a formula to every last column -
import openpyxl
filename = 'filename.xlsx'
filename_output = 'filename_output.xlsx'
wb = openpyxl.load_workbook(filename)
for sheet in wb.worksheets:
sheet.insert_cols(sheet.max_column)
for row in sheet.iter_rows():
row[-1].value = "=SUMIFS(J:J,M:M,#M:M)"
wb.save(filename_output)
Question -I cannot find documentation on how to name the column. Does anyone know how to achieve this?
Context -I want this column (in each worksheet) to be called 'Calculation'.
To get the last column, you can use sheet.max_column. Once you have updated the formulas, you can use sheet.cell(1,col).value = "Calc" to update the header. Updated code below...
import openpyxl
filename = 'filename.xlsx'
filename_output = 'filename_output.xlsx'
wb = openpyxl.load_workbook(filename)
for sheet in wb.worksheets:
sheet.insert_cols(sheet.max_column)
for row in sheet.iter_rows():
row[-1].value = "=SUMIFS(J:J,M:M,#M:M)"
sheet.cell(1,sheet.max_column).value = "Calculation" ## Add line inside FOR loop
wb.save(filename_output)
Output would look something like this.

Is there a way to write a dataframe starting from a specific cell in Python

I am trying to write a DataFrame into an existing Excel sheet, which will use my imported data to do some operations. I tried to use the openpyxl library and the dataframe_to_rows function to do it.
It did actually write the dataframe in the good sheet but it didn't write it from the beginning of the cells (The excels functions request the data starting from a specific cell)
Here is my code :
sheet_name = ['Zones','Bilan','CTA', "Annuel", "Ventilation"] # The names of the differents sheet where i need to import my DataFrames
vect_Data = [Data1,Data2,Data3, sortie_annuel, Ventil_data] # The DataFrames i need to import in the sheets
wb = op.load_workbook(filename = nom_excel) # I imported openpyxl as op
for i in range(len(sheet_name)):
ws = wb[sheet_name[i]]
for cells in ws :
for cell in cells :
cell.value = None
for r in dataframe_to_rows(vect_Data[i] , index=False , header=True):
ws.append(r)
wb.save(filename = nom_excel)
Is there a way to force the dataframe_to_rows function to begin from a specific cell?
Thank you for your answers.

How to split merged Excel cells with Python?

I am trying to split only the merged cells in Excel file (with multiple sheets) that are like:
Please note that there are partially/fully empty rows. These rows are not merged.
Using openpyxl, I found the merged cell ranges in each sheet with this code:
wb2 = load_workbook('Example.xlsx')
sheets = wb2.sheetnames ##['Sheet1', 'Sheet2']
for i,sheet in enumerate(sheets):
ws = wb2[sheets[i]]
print(ws.merged_cell_ranges)
The print output:
['B3:B9', 'B13:B14', 'A3:A9', 'A13:A14', 'B20:B22', 'A20:A22']
['B5:B9', 'A12:A14', 'B12:B14', 'A17:A18', 'B17:B18', 'A27:A28', 'B27:B28', 'A20:A22', 'B20:B22', 'A3:A4', 'B3:B4', 'A5:A9']
Since I found the merged cell ranges, I need to split the ranges and fill in the corresponding rows like this:
How can I split like this using openpyxl? I am new to using this module. Any feedback is greatly appreciated!
You need to use the unmerge function. Example:
ws.unmerge_cells(start_row=2,start_column=1,end_row=2,end_column=4)
when you use unmerge_cells function, sheet.merged_cells.ranges will be modified, so don't use sheet.merged_cells.ranges in for loop.
from openpyxl.workbook import Workbook
from openpyxl import load_workbook
from openpyxl.utils.cell import range_boundaries
wb = load_workbook(filename = 'tmp.xlsx')
for st_name in wb.sheetnames:
st = wb[st_name]
mcr_coord_list = [mcr.coord for mcr in st.merged_cells.ranges]
for mcr in mcr_coord_list:
min_col, min_row, max_col, max_row = range_boundaries(mcr)
top_left_cell_value = st.cell(row=min_row, column=min_col).value
st.unmerge_cells(mcr)
for row in st.iter_rows(min_col=min_col, min_row=min_row, max_col=max_col, max_row=max_row):
for cell in row:
cell.value = top_left_cell_value
wb.save('merged_tmp.xlsx')

Categories