Python: adding a column to one sheet from an excel file - python

I'm trying to add just one empty column into one sheet of an excel file. The excel file that I'm using has a specific structure that I can't change. That being said the column right after where I am trying to insert has a very small width. The code I have below will not insert the column before that small column and after a standard size column. But when I adjust the index to be in between 2 standard size columns there's no issue.
How can I fix my code to not have this issue inserting a column or are there better methods?
from openpyxl import load_workbook
workbook = load_workbook('file.xlsx')
sheet= workbook.worksheets[8]
sheet.inset_cols(185)
workbook.save(filename= 'file.xlsx')

Related

openpyxl: How to add rows to an existing table (in an existing xlsx file) that doesn't start at 'A1'

I have an Excel 'file.xlsx' file:
with a sheet that has a named excel table, which is somewhere in the middle of the sheet, say C3.
a bunch of charts etc that use this table as source.
I have a pyspark DataFrame that I want to write to this table, so all the charts are updated and I have an Excel report.
I know how to do this using loops to set Cell.value one cell at a time. I'm hoping to find something less tedious.
Unsolved problems:
Update an existing table in an existing xlsx file.
Not lose/delete everything else in the xlsx file being updated.
(preferably) avoid iterating over the input tabular data and update excel cell-by-cell.
Things that didn't work for me:
pyspark.pandas.DataFrame.to_excel() problem: This overwrites the whole 'file.xlsx' and we lose all other sheets / charts etc.
df.toPandas().to_excel('file.xlsx', sheet_name=sheet_name, engine='openpyxl', index=False,
startcol=3, startrow=3)
openpyxl.utils.dataframe.dataframe_to_rows() problem: Starts pasting data at A1. Don't know how to update activeCell or current_row so append() starts from B3 instead of A1.
ws: Worksheet = openpyxl.open('file.xlsx').create_sheet(title=sheet_name)
for r in dataframe_to_rows(df.toPandas(), index=False, header=True):
ws.append(r)
My current solution iter_rows() / iter_cols() / cell_range() / Worksheet.cell() problem: Loops over cell-by-cell.
I've read these and some more:
Appending data to existing tables in openpyxl
Manipulate existing excel table using openpyxl
Writing to row using openpyxl?
Write to an existing excel file using Openpyxl starting in existing sheet starting at a specific column and row
Openpyxl/Pandas - Convert CSV to XLSX
openpyxl convert CSV to EXCEL

How to append data to the last row (every time) of an Excel file?

I am looking for a way to append data from a Python program to an excel sheet. For this, I chose the openpyxl library to save this data.
My problem is how to put new data in the excel file without losing the current data, in the last row of the sheet. I look into the documentation but I did not see any answer.
I do not know if this library has a method to add new data or I need to make a logic to this task.
The last row of the sheet can be found using max_row():
from openpyxl import load_workbook
myFileName=r'C:\DemoFile.xlsx'
#load the workbook, and put the sheet into a variable
wb = load_workbook(filename=myFileName)
ws = wb['Sheet1']
#max_row is a sheet function that gets the last row in a sheet.
newRowLocation = ws.max_row +1
#write to the cell you want, specifying row and column, and value :-)
ws.cell(column=1,row=newRowLocation, value="aha! a new entry at the end")
wb.save(filename=myFileName)
wb.close()
What you're looking for is the Worksheet.append method:
Appends a group of values at the bottom of the current sheet.
If it’s a list: all values are added in order, starting from the first column
If it’s a dict: values are assigned to the columns indicated by the keys (numbers or letters)
So no need to check for the last row. Just use this method to always add the data at the end.
ws.append(["some", "test", "data"])

How to pull last cell in column using openpyxl in python

I created a small program that writes to an excel file. I have another program that needs to read the last entry (in column A) every day. Since there is a new data imported into the excel file every day, the cell that I need to capture is different.
I'm looking to see if there is a way for me to grab the last cell in Column A using openpyxl in python?
I don't have much experience with this, so I wasn't sure where to start.
import openpyxl
wb = openpyxl.load_workbook('text.xlsx')
sheet = wb.get_sheet_by_name('Sheet1')
from https://openpyxl.readthedocs.io/en/stable/tutorial.html
try this, it should get the entire A column and take the last entry:
sheet['A'][-1]

Using Python to load template excel file, insert a DataFrame to specific lines and save as a new file

I'm having troubles writing something that I believe should be relatively easy.
I have a template excel file, that has some visualizations on it with a few spreadsheets. I want to write a scripts that loads the template, inserts an existing dataframe rows to specific cells on each sheet, and saves the new excel file as a new file.
The template already have all the cells designed and the visualization, so i will want to insert this data only without changing the design.
I tried several packages and none of them seemed to work for me.
Thanks for your help! :-)
I have written a package for inserting Pandas DataFrames to Excel sheets (specific rows/cells/columns), it's called pyxcelframe:
https://pypi.org/project/pyxcelframe/
It has very simple and short documentation, and the method you need is insert_frame
So, let's say we have a Pandas DataFrame called df which we have to insert in the Excel file ("MyWorkbook") sheet named "MySheet" from the cell B5, we can just use insert_frame function as follows:
from pyxcelframe import insert_frame
from openpyxl import load_workbook
workbook = load_workbook("MyWorkbook.xlsx")
worksheet = workbook["MySheet"]
insert_frame(worksheet=worksheet,
dataframe=df,
row_range=(5, 0),
col_range=(2, 0))
0 as the value of the second element of row_range or col_range means that there is no ending row or column specified, if you need specific ending row/column you can replace 0 with it.
Sounds like a job for xlwings. You didn't post any test data, but modyfing below to suit your needs should be quite straight-forward.
import xlwings as xw
wb = xw.Book('your_excel_template.xlsx')
wb.sheets['Sheet1'].range('A1').value = df[your_selected_rows]
wb.save('new_file.xlsx')
wb.close()

Pandas update excel file with formatting

I have an excel file which has multiple sheets and special formatting (colors, symbols, etc...).
In Python I know that I can read the file into a data frame, update certain columns and then write the file back but it looses all formatting and gets overwritten.
Is there a way to open the file and just update the values of certain columns, keeping other sheets untouched and formatting as is?
Yes you can use openpyxl in this way for example:
from openpyxl.reader.excel import load_workbook
wb = load_workbook(filename='mypath\myfile.xlsx')
ws = wb.worksheets[0]
ws.cell(coordinate="A1").value = 2
wb.save("mypath\myfile.xlsx")
Where the cell A1 has a particular format. Its format stays the same and only the value of the cell changes.
To read the value of the cell, you can use this:
ws.cell(row=row_number, column=column_number).value
To change values of a column with a for loop, this is an option:
new_data = ['a','b','c','d']
for index, cell in enumerate(ws['A']):
cell.value = new_data[index]

Categories