Opening an Excel File in Python Disables Dynamic Arrays

Opening an Excel File in Python Disables Dynamic Arrays - python

I have an excel workbook that uses functions like OFFSET, UNIQUE, and FILTER which spill into other cells. I'm using python to analyze and write some data to the workbook, but after doing so these formulas revert into normal arrays. This means they now take up a fixed number of cells (however many they took up before opening the file in python) instead of adjusting to fit all of the data. I can revert the change by selecting the formula and hitting enter, but there are many of these formulas it's more work to fix them than to just print the data to a text file and paste it into excel manually. Is there any way to prevent this behavior?
I've been using openpyxl to open and save the workbook, but after encountering this issue also tried xlsxwriter and the dataframe to excel function from pandas. Both of them had the same issue as openpyxl. For context I am on python 3.11 and using the most recent version of these modules. I believe this issue is on the Python side and not the Excel side, so I don't think changing Excel settings will help, but maybe there is something there I missed.
Example:
I've created an empty workbook with two sheets, one called 'main' and one called 'input'. The 'main' sheet will analyze data from the 'input' sheet which will be entered with openpyxl. The data will just be values in the first column.
In cell A1 of the 'main' sheet, enter =OFFSET(input!A1,0,0,COUNTA(input!A:A),1).
This formula will just show a copy of the data. Since there currently isn't any data it gives a #REF! error, so it only takes up one cell.
Now I'll run the following python code to add the numbers 0-9 into the first column of the input sheet:
from openpyxl import load_workbook
wb = load_workbook('workbook.xlsx')
ws = wb['input']
for i in range(10):
ws.append([i])
wb.save('workbook_2.xlsx')
When opening the new file, cell A1 on the 'main' sheet only has the first value, 0, instead of the range 0--9. When selecting the cell, you can see the formula is now {=OFFSET(input!A1,0,0,COUNTA(input!A:A),1)}. The curly brackets make it an array, so it wont spill. By hitting enter in the formula the array is removed and the sheet properly becomes the full range.
If I can get this simple example to work, then expanding it to the data I'm using shouldn't be a problem.

Related

Appending Excel cell values using pandas

Edit: I found out a solution to my question. More or less look at the user manual for openPyxl instead of online tutorials, the tutorials ran errors when I tried them (I tried more than one) and their thought process was significantly different from the thought process in the user manual. And also I ended up not using pandas as much as I thought I would need to.
I am trying to append certain values in an Excel file with multiple sheets based on user inputs and then rewrite it to the Excel file (without deleting the rest of the sheets). So far I have tried this which seems to combine the data but I didn't quite see how it applied to what I am doing since I want to append a part of a sheet instead of rewrite the whole excel file. I have also tried a few other things with ExcelWriter but I don't quite understand it since it usually wipes all the data in the file (I may be using it wrong).
episode_dataframe = pd.read_excel (r'All_excerpts (Siena Copy)_test.xlsx', sheet_name=episode)
#episode is a specified string inputted by user, this line makes a data frame for the specified sheet
episode_dataframe.loc[(int(pass_num) - 1), 'Resources'] = resources
#resources is also a user inputted string, it's what I am trying to append the spreadsheet cell value to, this appends to corresponding data frame
path_R = open("All_excerpts (Siena Copy)_test.xlsx", "rb")
with pd.ExcelWriter(path_R) as writer:
writer.book = openpyxl.load_workbook(path_R)
#I copied this from [here][3], i think it should make the writer for the to_excel? I don't fully know
episode_dataframe.to_excel(writer, sheet_name=episode, engine=openpyxl, if_sheet_exsits ='replace')
#this should write the sheet data frame onto the file, but I don't want it to delete the other sheets
Additionally, I have been running into a bunch of other smaller errors, a big one was Workbook' object has no attribute 'add worksheet' even though I'm not trying to add a worksheet, also I could not get their solution to work.
I am a bit of a novice at python, so my code might be a bit of a mess.

How to pull last cell in column using openpyxl in python

I created a small program that writes to an excel file. I have another program that needs to read the last entry (in column A) every day. Since there is a new data imported into the excel file every day, the cell that I need to capture is different.
I'm looking to see if there is a way for me to grab the last cell in Column A using openpyxl in python?
I don't have much experience with this, so I wasn't sure where to start.
import openpyxl
wb = openpyxl.load_workbook('text.xlsx')
sheet = wb.get_sheet_by_name('Sheet1')

from https://openpyxl.readthedocs.io/en/stable/tutorial.html
try this, it should get the entire A column and take the last entry:
sheet['A'][-1]

Reading cell value without redefining it with Openpyxl

I need to read this .xlsm database and some of the cells values I need are derived from Excel functions. To accomplish this I used:
from openpyxl import load_workbook
wb = load_workbook('file.xlsm', data_only=True, keep_vba=True)
ws = wb['Plan1']
And then, for every cell I wanted to read:
ws.cell(row=row, column=column).value
This works fine for getting the data out. But the problem comes with saving. When I do:
wb.save('file.xlsm')
It saves the file, but all the formulas inside the sheets are lost
My dilemma is reading the cell's displayed values on one of the database's sheet without modifying them, writing the code's output in a new sheet and saving it.

Read the file once in read-only and data-only mode to look at the values and another time keeping the VBA around. And save under a different name.

Use a value from a cell which is written with formulas

I have two columns in Excel. The first(column C) has cells with values, the second one(column B), I had used a script to extract some values from the first one with Excel formulas.
Now I want to use the values from the second column in another column and the script doesn't have any errors but gives me empty cells because the second column contains formulas.
Is it possible to paste values or to extract only the values from the second column?
Here is my code:
for i in range(0,len(listaunica)):
ws4.cell(row=i+1,column=3).value=listaunica[i]
for i in range(0,len(listaunica)):
ws4.cell(row=i+1,column=2).value='=iferror(find(".",C{0}),C{0})'.format(i+1)
Can someone help me with this?

I do not fully understand your situation, so I will explain some possibilities:
(1) You have an Excel workbook that was saved using Excel itself. In this case, column B should have both formulas and the results of those formulas, because Excel would have calculated them.
(2) You have an Excel workbook that was saved using some other method, such as being written by OpenPyXL, and has not (yet) been opened and saved by Excel. In this case, you most likely have either formulas or results stored in column B.
When you are reading using OpenPyXL, you have to choose whether you want formulas or results. This is controlled by the data_only parameter. Set this to True if you want just the results. If your workbook was saved in Excel, and thus has both formulas and results, then the way to read them both in OpenPyXL is to open the workbook twice, once with data_only=False and once with data_only=True. Cumbersome, but that is how OpenPyXL is designed.
If you have a workbook from scenario (2), and column B still looks like it has formulas, then most likely trying to open the workbook using data_only=True will just return zeros for column B. You won't be able to get the results from this workbook until you open it in Excel and then save it.

Try this
for i in range(0,len(listaunica)):
ws4.cell(row=i+1,column=3).value=listaunica[i]
for i in range(0,len(listaunica)):
ws4.cell(row=i+1,column=2).value='=iferror(find(".",C{0}),C{0})'.format(i+1)
ws4.cell(row=i+1,column=2).value = ws4.cell(row=i+1,column=2).value
For reference Does .Value = .Value act similar to Evaluate() function in VBA??

How to apply conditional formatting in openpyxl?

I am using openpyxl to manipulate a Microsoft Excel Worksheet.
What I want to do is to add a Conditional Formatting Rule that fills the rows with a given colour if the row number is even, leaves the row blank if not.
In Excel this can be done by selecting all the worksheet, creating a new formatting rule with the text =MOD(ROW();2)=0 or =EVEN(ROW()) = ROW().
I tried to implement this behaviour with the following lines of code (considering for example the first 10 rows):
redFill = PatternFill(start_color='EE1111', end_color='EE1111', fill_type='solid')
ws2.conditional_formatting.add('A1:A10', FormulaRule(formula=['MOD(ROW();2) = 0'], stopIfTrue=False, fill=redFill))
My program runs correctly but when I try to open the output Excel file, it tells me that the file contains unreadable content and it asks me if I want to recover the worksheet content. By clicking yes, the worksheet is what I expect but there is no formatting.
What is the correct way to apply such a formatting in openpyxl (possibly to the entire worksheet)?

Unfortunately, the way formulae are handled in conditional formatting is particularly opaque. The best thing to do is to create a file with the relevant conditional format and inspect the relevant file by unzipping it. The rules are stored in the relevant worksheet files and the formats in the styles file.
However, I suspect that the problem may simply because you are using ";" to separate parameters in the function: you must always use commas for this.
A sample formula from one of my projects:
green_text = Font(color="006100")
green_fill = PatternFill(bgColor="C6EFCE")
dxf2 = DifferentialStyle(font=green_text, fill=green_fill)
r3 = Rule(type="expression", dxf=dxf2)
r3.formula = ["AND(ISNUMBER(C2), C2>=400)"]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.