I am currently trying to use xlwings to open a book and update it's links, then save and close. The relevant code I am using is:
import os
import xlwings as xw
app=xw.App(add_book=False)
app.display_alerts=False
for file in os.scandir(dirname):
if (file.name.endswith("Unposted Summary.xlsm")):
path=file.path
tmp=app.books.api.Open(path,UpdateLinks=True)
tmp.save(path)
app.quit()
After having read the documentation several times and using several different methods such as app.quit(), app.kill(), book.close(), etc... I have been unable to get xlwings to close the current book after saving it, so I haven't even approached the question of whether the links are updating properly or not.
I'm guessing the problem is coming from how I'm opening the books. If so, I don't know the syntax to close them.
I don't usually use xlwings, but from what I understand app.books.api.Open calls and returns the COM object, from where I don't even think tmp.save(...) would work (at least not in my case).
A better option would be work directly with xw.Book wrapper instead without the api call:
for file in os.scandir(dirname):
if (file.name.endswith("Unposted Summary.xlsm")):
tmp=app.books.open(file.path, update_links=True)
tmp.save()
tmp.close()
I would also advise you to exercise os.path.abspath and keep in mind your working directory while looping though dirname.
Related
This is a simple issue. I use jupyter notebook for python and usually deal with pdfs using pymupdf.
I usually define pdf = fitz.open('dir/to/file.pdf') but somethimes I forget to close the file before i redefine pdf = fitz.open('dir/to/other_file.pdf')
Sometimes I need to (for example) move or delete file.pdf (the original file) but I can't because python is using it.
Not being an expert, I don't know how to close this file when I have redefined the variable pdf, as obviously pdf.close() would close 'other_file.pdf' and I end up reeinitializing my .ipynb file, which feels dumb.
How can I access an object which variable name has been redefined?
If you do a Document.save("newname.pdf") then that new file, newname.pdf will be immediately available for other processes - it is not blocked by the process you are currently executing.
The original file however, oldname.pdf, from which you have created your Document object remains owned by your current process.
It will be freed if you do a Document.close().
But there is a way to work with the oldname.pdf under PyMuPDF without blocking it. It actually entails making a memory-resident copy:
import pathlib
import fitz
pdfbytes = pathlib.Path("oldname.pdf").read_bytes()
# from here on, file oldname.pdf is fully available
doc = fitz.open("pdf", pdfbytes)
# doc can be saved under any name, even the original one.
Writting this issue made me think about globals()
Browsing throughout its keys I found that the objects which variables have been reused are stored with dummy names (don't know the term used for them). I found the object I was looking for and I was able to 'close' it.
If there's a better - more elegant solution, I'd be glad to hear about it.
I am trying to read in an 'xls' files in python using pandas. My code basically is a one-liner:
import pandas as pd
df = pd.read_excel(str("/test/test_file.xls"))
This code works for the majority of the files, but there are cases when it fails with the error:
Excessive indirect references in NAME formula
What I tried so far:
Tried changing the stack limit(panic and warning) to as far as 10000 in the Pandas package itself, where the exception was occurring. A recursion limit was encountered, so raised it as far as 125000, which led to my Mac/Python reaching its limit so I am guessing not the right solution.
Used a memory-intensive EMR to see if it can read the file - nope.
Looked at the GitHub repo for XLRD here to raise a bug only to find out it's out of support.
Opened the file, saved it as an xlsx, used the same code to read it into a dataframe. Worked like a charm.
Tried using Spark Excel Library to read in a particular section of the data - this worked too but I need to use pandas.
Googled it only to find out the results would show me the XLRD code where the exception is defined. Not one person has reported it.
Tried using Python2 and Python3 with the latest and older versions of Pandas - no use.
I cannot share the file, but has anyone faced this issue before? Can someone help? All suggestions are welcome!
Try the following:
Open the xls file
Copy/paste all cells as values
Rerun your script
Hard to help further without having access to the file to explain exactly what is happening.
But chances are xlrd is trying to resolve the value of a formula and is exceeding the "STACK_PANIC_LEVEL". Without seeing the formula, very difficult to say more.
xlrd has a method of evaluate_name_formula(). When you try to open a .xls file with xlrd, it will raise an error (as you described) if your file has many user-defined formulas. To try to solve your problem, I think you can delete these user-defined formulas and keep the file free of these formulas. Or you can try to edit xlrd code, and prevent it from raising the Error, which seems much more difficult.
I want to use python to find what the address or coordinates of the currently active or selected cell in an excel spreadsheets currently active sheet.
So far all I've been able to do is the latter. Perhaps I'm just using the wrong words to search. However, this is the first time in two years of writing first VBA and now Python that I haven't been able to just search and find the answer. Even if it took me half a day.
I've crawled through the code at readthedocs (http://openpyxl.readthedocs.org/en/latest/_modules/index.html)
and looked through the openpyxl.cell.cell, openpyxl.worksheet.worksheet, openpyxl.worksheet.views code. The last seemed to have some promise and led me to writing the code below. Still, no joy, and I don't seem to be able to phrase my online searches to be able to pinpoint results that talk about finding the actual active/selected cell. Perhaps this is because openpyxl is really looking at the saved spreadsheet which might not include any data on the last cell to be selected.
I've tried it both in Python 3.4.3 and 2.7.11. Using openpyxl 2.4.0.
Here's the code that got me the closest to my goal. I was running it in Python3.
from openpyxl.worksheet.views import Selection
import openpyxl
wb = openpyxl.load_workbook('example.xlsx')
ws = wb.active
print(wb.get_sheet_names())
print(ws)
print(Selection.activeCell)
Which gives me the below.
['Sheet1', 'Sheet2', 'Sheet3']
<Worksheet "Sheet3">
Values must be of type <class 'str'>
I put in the first two prints just to prove to myself that I'm actually accessing the workbook/sheet.
If I change the last line to:
print(Selection.activeCellId)
I get:
Values must be of type <class 'int'>
I assume this is because these are only for writing not querying. I've toyed with the idea of writing a VBA macro and just running it from python. However, this code will be used with spreadsheets I don't control. By people who aren't necessarily capable of fixing any problems. I don't think I'm capable of writing something good enough to handle any problems that might crop up either.
Any help will be greatly appreciated.
It's difficult to see the purpose of an active cell for a library like openpyxl as it is effectively a GUI artefact. Nevertheless, because openpyxl works hard to implement the OOXML specification it should be possible to read the value stored by the previous application, or write it.
ws.views.sheetView[0].selection[0].activeCell
Consider the win32com library to replicate the Excel VBA property, ActiveCell. Openpyxl might have a limited method for this property while wind32com allows Python to fully utilize the COM libraries of Windows programs including the MS Office Suite (Excel, Word, Access, etc.). You can even manipulate files as a child process as if your were directly writing VBA.
import win32com.client
# OPEN EXCEL APP AND SPREADSHEET
xlApp = win32com.client.Dispatch("Excel.Application")
xlApp.Workbooks.Open('example.xlsx')
xlApp.ActiveWorkbook.Worksheets('Sheet1').Activate
print(xlApp.ActiveCell)
xlApp.ActiveWorkbook.Close(False)
xlApp.Quit
xlApp = None
This question is for calling Python from Excel. In VBA you'd do your RunPython("import mymodule; mymodule.my_function()")
In Python you would have something like,
from xlwings import Workbook, Sheet, Range
def my_function():
wb = Workbook.caller() # Create reference to calling Excel file
Range('A1:C3').clear_contents() # Clear some cells
My question is that this will work for your first instance of Excel. But in the event you have two instances open, and you're trying to run the code in the second instance, you will get a raised exception saying, "Can't establish connection! Make sure the calling workbook is the active one and is opened in the first instance of Excel."
So it seems like this is designed to work on only the first instance. Is there a way around this? Can you identify which instance you're in in the Python script? The user was hoping to run VBA macros that call Excel across multiple instances.
In principle, xlwings can deal with 2 instances. However, depending on your security settings, it might treat files downloaded from the internet or stored on a network drive as unsecure and "sandbox" them. These files are only accessible to xlwings if they run in the first instance.
If this sounds like your issue, then lowering the security settings could potentially solve the issue. See also this answer here.
Does anyone know of a way of accessing MS Excel from Python? Specifically I am looking to create new sheets and fill them with data, including formulae.
Preferably I would like to do this on Linux if possible, but can do it from in a VM if there is no other way.
xlwt and xlrd can read and write Excel files, without using Excel itself:
http://www.python-excel.org/
Long time after the original question, but last answer pushed it top of feed again. Others might benefit from my experience using python and excel.
I am using excel and python quite bit. Instead of using the xlrd, xlwt modules directly, I normally use pandas. I think pandas uses these modules as imports, but i find it much easier using the pandas provided framework to create and read the spreadsheets. Pandas's Dataframe structure is very "spreadsheet-like" and makes life a lot easier in my opinion.
The other option that I use (not in direct answer to your problem) is DataNitro. It allows you to use python directly within excel. Different use case, but you would use it where you would normally have to write VBA code in Excel.
there is Python library to read/write Excel 2007 xlsx/xlsm files http://pythonhosted.org/openpyxl/
I wrote python class that allows working with Excel via COM interface in Windows http://sourceforge.net/projects/excelcomforpython/
The class uses win32com to interact with Excel. You can use class directly or use it as example. A lot of options implemented like array formulas, conditional formatting, charts etc.
It's surely possible through the Excel object model via COM: just use win32com modules for Python. Can't remember more but I once controlled the Media Player through COM from Python. It was piece of cake.
Its actually very simple. You can actually run anything from any program. Just see a way to reach command prompt from that program. In case of Excel, create a user defined function by pressing Alt+F11 and paste the following code.
Function call_cmd()
Shell "CMD /C Notepad", vbNormalFocus
End Function
Now press ctrl+s and go back to Excel, select a cell and run the function =call_cmd(). Here I ran Notepad. In the same way, you can see where python.exe is installed and run it. If you want to pass any inputs to python, then save the cells as file in local directory as csv file and read them in python using os.system().