Excessive indirect references in NAME formula - python

I am trying to read in an 'xls' files in python using pandas. My code basically is a one-liner:
import pandas as pd
df = pd.read_excel(str("/test/test_file.xls"))
This code works for the majority of the files, but there are cases when it fails with the error:
Excessive indirect references in NAME formula
What I tried so far:
Tried changing the stack limit(panic and warning) to as far as 10000 in the Pandas package itself, where the exception was occurring. A recursion limit was encountered, so raised it as far as 125000, which led to my Mac/Python reaching its limit so I am guessing not the right solution.
Used a memory-intensive EMR to see if it can read the file - nope.
Looked at the GitHub repo for XLRD here to raise a bug only to find out it's out of support.
Opened the file, saved it as an xlsx, used the same code to read it into a dataframe. Worked like a charm.
Tried using Spark Excel Library to read in a particular section of the data - this worked too but I need to use pandas.
Googled it only to find out the results would show me the XLRD code where the exception is defined. Not one person has reported it.
Tried using Python2 and Python3 with the latest and older versions of Pandas - no use.
I cannot share the file, but has anyone faced this issue before? Can someone help? All suggestions are welcome!

Try the following:
Open the xls file
Copy/paste all cells as values
Rerun your script
Hard to help further without having access to the file to explain exactly what is happening.
But chances are xlrd is trying to resolve the value of a formula and is exceeding the "STACK_PANIC_LEVEL". Without seeing the formula, very difficult to say more.

xlrd has a method of evaluate_name_formula(). When you try to open a .xls file with xlrd, it will raise an error (as you described) if your file has many user-defined formulas. To try to solve your problem, I think you can delete these user-defined formulas and keep the file free of these formulas. Or you can try to edit xlrd code, and prevent it from raising the Error, which seems much more difficult.

Related

Removing Missing References from Excel Workbook using VBA

ISSUE:
Is there a way to remove missing references (references that have prefix: "MISSING" mentioned before their name) from an Excel workbook using VBA? I have a macro-enabled Excel workbook which I have to share with my colleagues from time to time. The workbook uses a special add-in that utilizes specific references. However, due to a limited number of licenses, not everyone has that add-in installed in their Excel, which is why the references associated with that add-in show up as "MISSING" in their workbooks and throw compilation errors when they try to run any macro. I don't want my collegues to go poking around the developer tab and uncheck the "MISSING" references each time they get the file. Is there a way this can be automated using VBA?
Issue Screenshot
STEPS TAKEN ALREADY:
I already tried the following code but it didn't work.
Option Explicit
Sub References_RemoveMissing()
Dim theRef As Variant, i As Long
For i = ThisWorkbook.VBProject.References.Count To 1 Step -1
Set theRef = ThisWorkbook.VBProject.References.Item(i)
If theRef.IsBroken = True Then
ThisWorkbook.VBProject.References.Remove theRef
End If
Next i
End Sub
When I ran the above code, I got an error on line:
ThisWorkbook.VBProject.References.Remove theRef
The error stated: "Run-time error '-2147319779 (8002801d)': Object library not registered"
Error Screenshot
Before running the above code, I made sure that "Microsoft Visual Basic for Application Extensibility 5.3" is checkmarked/ticked in my MS Excel reference list and placed above the missing references (which I did by increasing its priority).
My excel security settings were also set to "Trust Access To Visual Basic Project".
I looked up on google, this seems to be a pretty old issue. I found a bunch of old links suggesting the same coding solution but none of them seem to work for me.
Excel VBA prevent from importing missing references (didn't work for me)
https://support.microsoft.com/en-us/topic/how-to-check-and-remove-incorrect-project-references-in-the-visual-basic-editor-in-word-7ba187a6-9dfd-1288-8f08-d1f01ea02a3f (didn't work for me)
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/b90ed5cc-6bd1-46b4-bbea-de4a15521b26/detect-and-remove-missing-references-in-vba-code?forum=exceldev (didn't work for me)
POSSIBLE ALTERNATE SOLUTION:
If the missing references cannot be removed using VBA, is it possible to have an external Python-based script embedded within excel VBA that runs and removes the "MISSING" references?
Any help on this would be greatly appreciated! :)

How can I automatically calculate the formulas of an Excel file made with OpenPyxl?

I'm working on an program which makes an excel file, then it gets the info into JSON and does more things. I'm struggling with Openpyxl. I found out today that if you don't open an Excel file made with Openpyxl with Excel, the formulas won't be computer.
So when I write:
excel = load_workbook(self.path_excel, read_only=True, data_only=True)
I don't get the formulas result, but only a "None" result. If I instead write data_only=False I will get my original formula. I very well know why this happens and I'm trying to find an automatic solution to open the excel file, compute all the formulas inside my excel file and close it. So when I open it up again in Openpyxl in the code after this "phase" I will have my results.
I'm using Python btw.
Here is the result I get and what I want to get:
1: data_only=True
data_only=False
What I really want with data_only=True
'delta_1': '12345' and more answers with numbers like when I open it in excel...
Thanks for the eventual help :)
I'm answering my own post for future reference and for others having my same problem...
Basically what worked was:
Stop using Pycharm as for some reason it was limiting my Python code;
Download Visual Studio Code;
Use the xlwings library as suggested by jezza_99 in the second comment.
Thanks for the help everyone :)

Xlwings won't close book after saving

I am currently trying to use xlwings to open a book and update it's links, then save and close. The relevant code I am using is:
import os
import xlwings as xw
app=xw.App(add_book=False)
app.display_alerts=False
for file in os.scandir(dirname):
if (file.name.endswith("Unposted Summary.xlsm")):
path=file.path
tmp=app.books.api.Open(path,UpdateLinks=True)
tmp.save(path)
app.quit()
After having read the documentation several times and using several different methods such as app.quit(), app.kill(), book.close(), etc... I have been unable to get xlwings to close the current book after saving it, so I haven't even approached the question of whether the links are updating properly or not.
I'm guessing the problem is coming from how I'm opening the books. If so, I don't know the syntax to close them.
I don't usually use xlwings, but from what I understand app.books.api.Open calls and returns the COM object, from where I don't even think tmp.save(...) would work (at least not in my case).
A better option would be work directly with xw.Book wrapper instead without the api call:
for file in os.scandir(dirname):
if (file.name.endswith("Unposted Summary.xlsm")):
tmp=app.books.open(file.path, update_links=True)
tmp.save()
tmp.close()
I would also advise you to exercise os.path.abspath and keep in mind your working directory while looping though dirname.

Why are the indexing methods missing when working on a dataframe?

My code is an update of an existing script which outputs an xslx file with a lot of data. The original script is pretty stable and has worked for ages.
What I'm trying to do is that, after the original script has ended and the xslx is created, I want to input the file into Pandas and then run a series of analyses on it, using the methods .loc(), .iloc(), .index().
But after I read the file into a variable, when I hit '.' after the variable's name in PyCharm, I get all the dataframe and NDArray methods... except those three that I need.
No errors, no explanations. They are just not there.
And if I ignore this and go on and type them up manually, the variable I put the results into doesn't show ANY methods when I hit '.' for it, next (instead of showing the methods for, say, a series).
I've tried clearing the xslx file of all formatting (it originally had empty lines hidden). I tried running .info() and .head() to make sure they both run fine (They seem to, yes). I even updated my code from Python 2.7 to Python 3.7 using the 2to3 scripts to see if that might change anything. It didn't.
import pandas as pd
analysis_file = pd.read_excel("F:\\myprogram\\output1.xlsx", "Sheet1")
analysis_file. <--- The problem's here
Really not sure how to proceed, and no one I've asked so far has been able to help me.

How to get the modified date of an .xls file using xlrd

Let me try to improve this question, as it is still valid for me.
I have been using Openpyxl to read Excel files for a while. Now I need to extend the capability of my script to handle "legacy" Excel files that are not supported by Openpyxl. For this I use xlrd and xlutils.
On issue I have yet to solve is how to get the modified date of an .xls file in the case where I don't have the path. Using Openpyxl, I can get this as Workbook.properties.modified, as a datetime object. With xlrd I am struggling. A workaround would be to figure out the path to the file (which may be input to my function as a file like object) and to use os.path.getmtime, but I am not sure if that is equivalent.
Any help is appreciated!
I just came across your post while searching for a solution myself. Seems like
wb = xlrd.open_workbook(filename=fn)
wb.props.get('modified')
does the job.
xlrd==1.2.0

Categories