Openpyxl crash excel file with table after save - python

I have created a basic workbook with one worksheet and in this worksheet I have created a table (Insert>Table). Nothing complex in this table, just the value 1, 2, 3 (and the column header of course).
I have written this simply code
import openpyxl
thefilename = r"C:\Users\Myfile.xlsx"
book = openpyxl.load_workbook(thefilename)
book.activesheet
book.save(thefilename)
Then, when I try to open the excel file, the file is corrupted and impossible to reopen it again.
This looks like a bug but I wonder how I can detect if my excel file has a table and how to remove it ?

Related

openpyxl: How to add rows to an existing table (in an existing xlsx file) that doesn't start at 'A1'

I have an Excel 'file.xlsx' file:
with a sheet that has a named excel table, which is somewhere in the middle of the sheet, say C3.
a bunch of charts etc that use this table as source.
I have a pyspark DataFrame that I want to write to this table, so all the charts are updated and I have an Excel report.
I know how to do this using loops to set Cell.value one cell at a time. I'm hoping to find something less tedious.
Unsolved problems:
Update an existing table in an existing xlsx file.
Not lose/delete everything else in the xlsx file being updated.
(preferably) avoid iterating over the input tabular data and update excel cell-by-cell.
Things that didn't work for me:
pyspark.pandas.DataFrame.to_excel() problem: This overwrites the whole 'file.xlsx' and we lose all other sheets / charts etc.
df.toPandas().to_excel('file.xlsx', sheet_name=sheet_name, engine='openpyxl', index=False,
startcol=3, startrow=3)
openpyxl.utils.dataframe.dataframe_to_rows() problem: Starts pasting data at A1. Don't know how to update activeCell or current_row so append() starts from B3 instead of A1.
ws: Worksheet = openpyxl.open('file.xlsx').create_sheet(title=sheet_name)
for r in dataframe_to_rows(df.toPandas(), index=False, header=True):
ws.append(r)
My current solution iter_rows() / iter_cols() / cell_range() / Worksheet.cell() problem: Loops over cell-by-cell.
I've read these and some more:
Appending data to existing tables in openpyxl
Manipulate existing excel table using openpyxl
Writing to row using openpyxl?
Write to an existing excel file using Openpyxl starting in existing sheet starting at a specific column and row
Openpyxl/Pandas - Convert CSV to XLSX
openpyxl convert CSV to EXCEL

openpyxl is exporting excel file with error

I have an excel file where the first four rows contain some header text and the actual dataset starts from row 4. I am trying to build a simple function that reads the excel file and outputs the same excel file after deleting the first 4 rows.
This is what my code looks like before I put it into a function.
import pandas as pd
from openpyxl import load_workbook, Workbook
wb = load_workbook('FILEPATH/excel.xlsx')
ws = wb['Sheet1']
ws = ws.delete_rows(0,4)
wb.save(r"FILEPATH/deleted_row.xlsx")
When I run the code it executes the file properly but when I try to open the excel file it give me errors and says that the file is corrupted. A point to note is that the excel file has some formatting on the rop rows. Is that what is causing some issues?
Any help is appreciated.
EDIT: This is what the errors look like and the file does not open.
In openpyxl, the first row should be 1, not 0. So, if you are looking to delete the first 4 rows, you should change the delete_row() from
ws = ws.delete_rows(0,4)
to
ws = ws.delete_rows(1,4)

How to save excel file with openpyxl and preserve pivot table as is?

I have an excel file - one sheet is used for writing data with python, other sheet contains pivot table. I want to keep pivot table exactly the same as source file.
The problem is that after saving new workbook with openpyxl I open excel file and refresh pivot table, it loses 'Field settings..' -> 'Repeat items label' checkbox and I need to manually turn it on each time. That is not very efficient, I would rather solve this with python.
Sample file has it checked, but checkbox seems to disappear after saving new file with openpyxl.
from openpyxl import load_workbook
from pathlib import Path
from datetime import date
import os
sample_file_path = Path('sample_excel.xlsx') # source excel
result_folder_path = Path('results')
wb = load_workbook(sample_file_path)
ws = wb["t_mm"] # worksheet with pivot table I want to preserve as is
# some manipulations to other worksheet
xlsx_filename = "test_my_file_%s.xlsx" % date.today().strftime('%d%m%Y')
completename = os.path.join(result_folder_path, xlsx_filename)
wb.save(completename)
I read the documentation https://openpyxl.readthedocs.io/en/stable/api/openpyxl.pivot.table.html, but couldn't figure out how to keep that checkbox. I am not excel or pivot table expert. I think this is the parameter I need "showMultipleLabel=True", but from docs I understand that it's "True" by default, so my chekbox should remain intact. Maybe other parameter?

Reading cell value without redefining it with Openpyxl

I need to read this .xlsm database and some of the cells values I need are derived from Excel functions. To accomplish this I used:
from openpyxl import load_workbook
wb = load_workbook('file.xlsm', data_only=True, keep_vba=True)
ws = wb['Plan1']
And then, for every cell I wanted to read:
ws.cell(row=row, column=column).value
This works fine for getting the data out. But the problem comes with saving. When I do:
wb.save('file.xlsm')
It saves the file, but all the formulas inside the sheets are lost
My dilemma is reading the cell's displayed values on one of the database's sheet without modifying them, writing the code's output in a new sheet and saving it.
Read the file once in read-only and data-only mode to look at the values and another time keeping the VBA around. And save under a different name.

Using Python to load template excel file, insert a DataFrame to specific lines and save as a new file

I'm having troubles writing something that I believe should be relatively easy.
I have a template excel file, that has some visualizations on it with a few spreadsheets. I want to write a scripts that loads the template, inserts an existing dataframe rows to specific cells on each sheet, and saves the new excel file as a new file.
The template already have all the cells designed and the visualization, so i will want to insert this data only without changing the design.
I tried several packages and none of them seemed to work for me.
Thanks for your help! :-)
I have written a package for inserting Pandas DataFrames to Excel sheets (specific rows/cells/columns), it's called pyxcelframe:
https://pypi.org/project/pyxcelframe/
It has very simple and short documentation, and the method you need is insert_frame
So, let's say we have a Pandas DataFrame called df which we have to insert in the Excel file ("MyWorkbook") sheet named "MySheet" from the cell B5, we can just use insert_frame function as follows:
from pyxcelframe import insert_frame
from openpyxl import load_workbook
workbook = load_workbook("MyWorkbook.xlsx")
worksheet = workbook["MySheet"]
insert_frame(worksheet=worksheet,
dataframe=df,
row_range=(5, 0),
col_range=(2, 0))
0 as the value of the second element of row_range or col_range means that there is no ending row or column specified, if you need specific ending row/column you can replace 0 with it.
Sounds like a job for xlwings. You didn't post any test data, but modyfing below to suit your needs should be quite straight-forward.
import xlwings as xw
wb = xw.Book('your_excel_template.xlsx')
wb.sheets['Sheet1'].range('A1').value = df[your_selected_rows]
wb.save('new_file.xlsx')
wb.close()

Categories