OpenPyXl removing formulas on load - python

I'm trying to use OpenPyXL to
Open an .xlsx file
Read a cell that I know contains a number
Write a different number to that cell
Save the result to the same or different named .xlsx file
However even if I only perform the first and last steps the resulting .xlsx file has all formulas removed. The simplest version of my code goes like:
from openpyxl import load_workbook
wb = load_workbook(filename=file_path, data_only=False, guess_types=False)
wb.save(file_path_new)
However even without changing anything I still lose all the formulas. I have tried different values for the options. My biggest problem is that only yesterday, the full code (including reading and writing a numerical cell) was working and the saved result had the new number in that cell (when viewed in excel).
I updated from 1.8.5 to 2.0.2 at some point but can't remember if this was before or after the original code worked.

Related

openpyxl issue with formula output

I have a ROM estimator tool I've made that takes data which is then output to an excel file with openpyxl. The problem I keep running into is when I output and XLOOKUP function to my excel file and go to open it, excel throws an error on the sheet where the formula is placed. When allowing excel to fix it, excel deletes the formula and I havent been able to find a work around.
I at first thought that the function was failing because of the order in which it was placed into the excel file. The XLOOKUP function populates a column of cells on the first page of the workbook. I attempted to use different functions that more of less did the same thing, but no matter the function I got the same "excel found a problem with one or more formula references in this worksheet". I also attempted to add the formula after the rest of the workbook was populated with information. If it was a data error maybe the data it was trying to access didn't exist yet. So I put the formula at the very end of my code, once everything else had been populated and I still got the same error message.
What doesn't make sense is that if I decide not populate the cell with the formula that is causing the issue using openpyxl and instead manually input the formula once the file is open and the rest of the worksheet is populated, it works totally fine. It just seems that when I input the formula i am unable to open the document without excel removing the formula to fix it.
Let me know if anyone has anything I could try to fix this issue. Right now i input the formula using openpyxl and simple remove the "=", adding it once I open it

Overwriting subsection of Excel file with pandas dataframe

I have a large (90-120mb) excel file that generates a monthly report I do. Every month new data comes in that needs to be inserted into columns A:K on one sheet with formulas in J on that create some sub report items. In the past, new data was cleaned manually in excel, however, I have been writing a python script that will do this heavy lifting for me. Currently the data is cleaned, but I cannot figure out how to export to the specific range I need it to go into, without overwriting all other 'data'.
I've tried:
wb = openpyxl.load_workbook('workbook.xlsx')
ws = wb['sheet']
for row in ws.iter_rows(min_row=5, max_row=ws.max_row, min_col=1, max_col=11):
for cell in row:
cell.value = None
wb.save('workbook.xlsx')
However, my issue here is that the file cannot be loaded via openpyxl. The command will run for roughly an hour before throwing me a memory error, and I know that this code works on a smaller file.
I've read nearly every thread I could find via google and nothing has yet worked. I am aware openpyxl has a write_only mode, but as far as I can find in the documentation this doesn't allow for one: the use of a pre-existing file, nor two: the targeting of specific cells.
I've been able to do similar things via R and STATA which I am more familiar with, but for this specific project Python is mandatory for the automation.
Any help would be appreciated.
Python 3.9.7
Pandas 1.3.5
Openpyxl 3.0.9

Can a pycel object be saved as an Excel Workbook

After parsing Excel file to Python and evaluating the workbook using pycel, can the pycel object be saved as an Excel file maintaining all original formatting, etc? I.e. only values need to be updated.
TL;DR
No, you cannot save a pycel object back into Excel.
Why not?
The basic problem is that pycel is based on openpyxl. Openpyxl is used to read (and write if needed) Excel spreadsheets. However, while openpyxl has the computed values available for formula cells for a workbook it read in, it does not really allow those computed values to be saved back into a workbook it writes. It doesn't really make sense to save a different computed value for a formula cell, since the cell's value will be recomputed once it is opened back up in Excel.
While it is true that pycel has the information available to properly populate a new value when the workbook is written, it evidently is not a use case that was important to the openpyxl authors or contributors.
Please note that the openpyxl maintainers gladly took pull requests to make it run better with pycel. It seems likely they would be open to discussing a PR for writing values into workbooks.

How to programmatically copy excel worksheet sheet to a new one

I have created a python script that parses some data and uses it to generate a few worksheet excel files (xlsx) (I am not much knowledgeable in excel by the way).
The original worksheet i am reading the data from has a main sheet used to fill in lots of info which is then distributed across many other sheets, which perform some formulas to calculate some results.
My python script does the processing i want by automatically filling in this main sheet, which then produces the results of the other sheets. I then save the worksheet and eveything looks fine.
Now, I want to split those worksheets into individual ones without including the main sheet in any of them. This appears to be pretty challenging.
I first tried using the data_only argument of load_workbook, but I quickly discovered that in order to preserve the values and not the formulas in the spreadsheet(or just get None back), I'd have to manually open and save each one of the files so that a temporary cache is created. That won't really do it in my case since i want the whole thing to be automated.
Among other things, the one that came closer is this piece of code:
workbook = xw.Book('generatedFiles/generated.xlsx')
sheet1 = workbook.sheets['PRIM'].used_range.value
df = pd.DataFrame(sheet1)
df.to_excel('generatedFiles/fixed-generated.xlsx', index = False, header = False)
This does indeed generate a spreadsheet with the values, using its pandas dataframe, but the problem is that it doesn't preserve any information about the types of the values. So for example, an integer is being treated as a string when saved.
This messes the processing being done by an external parser that I feed these files with.
Any ideas on how to fix that ? What would be the best way to go about doing something like that ?
Thanks !

How to parse cell format data from a .xlsx file using xlrd

I am relatively new to python and I am trying to read information from an excel sheet to generate a graph. So far I am using the most current version of the xlrd library (0.9.4) in a nested for loop to grab the value from each cell. However, I am unsure how to access the formatting information for each cell
For example, if a cell were formatted to display as currency in the excel file, using the standard sheet.cell(row, column).value from xlrd would only return 5.0 instead of $5.00
I found here that you can set the formatting_info parameter to true when opening the workbook in order to see some of the format information, however I am primarily using excel 2013 and my excel sheets are being saved by default as .xlsx files. According to this issue on GitHub, support for formatting_info has not yet been implemented for .xlsx files.
Is there any way around using the formatting_info flag, or any other way that I can detect when a format, currency specifically, has been used in order to reflect that in my graphs? I am aware that it is possible to convert .xlsx files to .xls files such as shown here, but I am concerned about information/formatting loss.

Categories