Cannot read an HDF file when using xlwings - HDFStore Requires PyTables - python

I am trying to create various Excel UDFs in python by using xlwings. My UDFs rely on values that are pulled from an HDF file. However, every time I click the "Import Functions" button in Excel, I receive an error. Below is an example.
import pandas as pd
import numpy as np
import xlwings as xw
matrix1 = pd.DataFrame(np.random.random(size = (1000, 1000)))
matrix2 = pd.DataFrame(np.random.random(size = (1000, 100)))
matrix1.to_hdf('matrix.h5', key = 'mat1', mode = 'w')
matrix2.to_hdf('matrix.h5', key = 'mat2', mode = 'a')
arg = pd.read_hdf('matrix.h5', key = 'mat2', mode = 'r')
#xw.func
def dummy(x, y):
return 17
When I click on the "Import Functions" button in the xlwings ribbon in Excel, I receive the following
If I try to run the program with Spyder, I have no issues and can generate the HDF files just fine.
Interestingly, if I remove the lines where I write the HDF file, and just leave the one where I read it, I get an error saying
FileNotFoundError: File matrix.h5 does not exist ...
Even though I have confirmed that the file does exist. If I run the same code in Spyder, I have no issues, it works fine.
Is there some kind of compatibility issue with xlwings and HDF files, or am I missing something?

Can't see xlwings being used for anything in the examples. It is however true that the PyTables is required. Try running pip install tables to install it.

Related

Openpyxl Reading Non-empty Cells as None [duplicate]

I have a simple excel file:
A1 = 200
A2 = 300
A3 = =SUM(A1:A2)
this file works in excel and shows proper value for SUM, but while using openpyxl module for python I cannot get value in data_only=True mode
Python code from shell:
wb = openpyxl.load_workbook('writeFormula.xlsx', data_only = True)
sheet = wb.active
sheet['A3']
<Cell Sheet.A3> # python response
print(sheet['A3'].value)
None # python response
while:
wb2 = openpyxl.load_workbook('writeFormula.xlsx')
sheet2 = wb2.active
sheet2['A3'].value
'=SUM(A1:A2)' # python response
Any suggestions what am I doing wrong?
It depends upon the provenance of the file. data_only=True depends upon the value of the formula being cached by an application like Excel. If, however, the file was created by openpyxl or a similar library, then it's probable that the formula was never evaluated and, thus, no cached value is available and openpyxl will report None as the value.
I have replicated the issue with Openpyxl and Python.
I am currently using openpyxl version 2.6.3 and Python 3.7.4. Also I am assuming that you are trying to complete an exercise from ATBSWP by Al Sweigart.
I tried and tested Charlie Clark's answer, considering that Excel may indeed cache values. I opened the spreadsheet in Excel, copied and pasted the formula into the same exact cell, and finally saved the workbook. Upon reopening the workbook in Python with Openpyxl with the data_only=True option, and reading the value of this cell, I saw the proper value, 500, instead of the wrong value, the None type.
I hope this helps.
I had the same issue. This may not be the most elegant solution, but this is what worked for me:
import xlwings
from openpyxl import load_workbook
excel_app = xlwings.App(visible=False)
excel_book = excel_app.books.open('writeFormula.xlsx')
excel_book.save()
excel_book.close()
excel_app.quit()
workbook = load_workbook(filename='writeFormula.xlsx', data_only=True)
I have suggestion to this problem. Convert xlsx file to csv :).
You will still have the original xlsx file. The conversion is done by libreoffice (it is that subprocess.call() line).You can use also Pandas for this as a more pythonic way.
from subprocess import call
from openpyxl import load_workbook
from csv import reader
filename="test"
wb = load_workbook(filename+".xlsx")
spread_range = wb['Sheet1']
#what ever function there is in A1 cell to be evaluated
print(spread_range.cell(row=1,column=1).value)
wb.close()
#this line can be done with subprocess or os.system()
#libreoffice --headless --convert-to csv $filename --outdir $outdir
call("libreoffice --headless --convert-to csv "+filename+".xlsx", shell=True)
with open(filename+".csv", newline='') as f:
reader = reader(f)
data = list(reader)
print(data[0][0])
or
# importing pandas as pd
import pandas as pd
# read an excel file and convert
# into a dataframe object
df = pd.DataFrame(pd.read_excel("Test.xlsx"))
# show the dataframe
df
I hope this helps somebody :-)
Yes, #Beno is right. If you want to edit the file without touching it, you can make a little "robot" that edits your excel file.
WARNING: This is a recursive way to edit the excel file. These libraries are depend on your machine, make sure you set time.sleep properly before continuing the rest of the code.
For instance, I use time.sleep, subprocess.Popen, and pywinauto.keyboard.send_keys, just add random character to any cell that you set, then save it. Then the data_only=True is working perfectly.
for more info about pywinauto.keyboard: pywinauto.keyboard
# import these stuff
import subprocess
from pywinauto.keyboard import send_keys
import time
import pygetwindow as gw
import pywinauto
excel_path = r"C:\Program Files\Microsoft Office\root\Office16\EXCEL.EXE"
excel_file_path = r"D:\test.xlsx"
def focus_to_window(window_title=None): # function to focus to window. https://stackoverflow.com/a/65623513/8903813
window = gw.getWindowsWithTitle(window_title)[0]
if not window.isActive:
pywinauto.application.Application().connect(handle=window._hWnd).top_window().set_focus()
subprocess.Popen([excel_path, excel_file_path])
time.sleep(1.5) # wait excel to open. Depends on your machine, set it propoerly
focus_to_window("Excel") # focus to that opened file
send_keys('%{F3}') # excel's name box | ALT+F3
send_keys('AA1{ENTER}') # whatever cell do you want to insert somthing | Type 'AA1' then press Enter
send_keys('Stackoverflow.com') # put whatever you want | Type 'Stackoverflow.com'
send_keys('^s') # save | CTRL+S
send_keys('%{F4}') # exit | ALT+F4
print("Done")
Sorry for my bad english.
As others already mentioned, Openpyxl only reads cashed formula value in data_only mode. I have used PyWin32 to open and save each XLSX file before it's processed by Openpyxl to read the formulas result value. This works for me well, as I don't process large files. This solution will work only if you have MS Excel installed on your PC.
import os
import win32com.client
from openpyxl import load_workbook
# Opening and saving XLSX file, so results for each stored formula can be evaluated and cashed so OpenPyXL can read them.
excel_file = os.path.join(path, file)
excel = win32com.client.gencache.EnsureDispatch('Excel.Application')
excel.DisplayAlerts = False # disabling prompts to overwrite existing file
excel.Workbooks.Open(excel_file )
excel.ActiveWorkbook.SaveAs(excel_file, FileFormat=51, ConflictResolution=2)
excel.DisplayAlerts = True # enabling prompts
excel.ActiveWorkbook.Close()
wb = load_workbook(excel_file)
# read your formula values with openpyxl and do other stuff here
I ran into the same issue. After reading through this thread I managed to fix it by simply opening the excel file, making a change then saving the file again. What a weird issue.

How to suppress "Update Links" Alert with xlwings

I am interfacing with Excel files in Python using the xlwings api. Some Excel files I am interacting with have old links which cause a prompt to appear when the file is opened asking if the user would like to update the links. This causes the code to hang indefinitely on the line that opened the book until this prompt is closed by a user. Is there a way to modify the settings of the Excel file so that this prompt will not appear or it will be automatically dismissed without opening the actual file?
I have tried using the xlwings method:
xlwings.App.display_alerts = False
to suppress the prompt, but as far as I can tell this can only be run for an instance of Excel after it has been opened. There are some Excel api's that do not require a file to be open in order to read data like xlrd, but they are not very convenient for reading and copying large amounts of data (Multiple/Entire sheets of data).
The following code demonstrates the issue:
import xlwings as xw
wb = xw.Book(r'C:\Path\To\File\Filename')
print('Done')
On a regular Excel file the code proceeds through and prints "Done" without the need of user interference, but on an Excel file where the "update links" prompt comes up, it will not proceed to the print statement until the prompt is dismissed by a user.
Expanding on your first attempt -- you're not handling an App instance, rather you're trying to assign to the xlwings.App class.
However, it seems that the display_alerts doesn't successfully suppress this alert in xlwings, try this:
import xlwings as xw
app = xw.App(add_book=False)
app.display_alerts = False
wb = app.books.api.Open(fullpath, UpdateLinks=False)
I believe there is an implementation in xlwings to avoid update links messages now. I was able to bypass these alerts by adding the following
app.books.open(fname, update_links=False, read_only=True, ignore_read_only_recommended=True)
You can see these arguments available in the documentation xlwings.Book.open(...)
I presently have 20+ source workbooks that I loop though to extract some rows of data. It was intolerable to respond to the update links prompt of each opened workbook. I tried the other solutions here but none worked for me. After reviewing the cited xlwings docs, this is the solution that worked for me:
for fname in workbook_list:
wb = xw.books.open(fname, update_links = False)
# Extract some data...
wb.close()
My environment is Win10Pro / Python 3.8.1 / pywin32 version: 303 / Excel 365 Subscription / xlwings 0.26.2

Reading xls file after resaving

I am downloading an excel file from a website.
If I just use pandas to open the file
import pandas as pd
df = pd.read_excel('filepath')
I get an error CompDocError: Workbook corruption: seen[2] == 4
If I resave file before opening it everything works fine
import pandas as pd
import win32com.client
def resave_excel(filename):
xcl = win32com.client.Dispatch('Excel.Application')
wb = xcl.workbooks.open(filename)
xcl.DisplayAlerts = False
wb.Save()
xcl.Quit()
resave_excel('filepath')
df = pd.read_excel('filepath')
The problem with this approach is that I actually call Excel application and it is not the safest thing to do, especially if I want to run the full script on some automated basis or if I want to run it on a different platform.
Is there a different approach that I am missing?
The only solution that I found is discussed on https://github.com/python-excel/xlrd/issues/149.
Instead of pandas you need to use xlrd and make changes to xlrd/compdoc.py.

XLWings data link warning not caught by display_alerts [duplicate]

I am interfacing with Excel files in Python using the xlwings api. Some Excel files I am interacting with have old links which cause a prompt to appear when the file is opened asking if the user would like to update the links. This causes the code to hang indefinitely on the line that opened the book until this prompt is closed by a user. Is there a way to modify the settings of the Excel file so that this prompt will not appear or it will be automatically dismissed without opening the actual file?
I have tried using the xlwings method:
xlwings.App.display_alerts = False
to suppress the prompt, but as far as I can tell this can only be run for an instance of Excel after it has been opened. There are some Excel api's that do not require a file to be open in order to read data like xlrd, but they are not very convenient for reading and copying large amounts of data (Multiple/Entire sheets of data).
The following code demonstrates the issue:
import xlwings as xw
wb = xw.Book(r'C:\Path\To\File\Filename')
print('Done')
On a regular Excel file the code proceeds through and prints "Done" without the need of user interference, but on an Excel file where the "update links" prompt comes up, it will not proceed to the print statement until the prompt is dismissed by a user.
Expanding on your first attempt -- you're not handling an App instance, rather you're trying to assign to the xlwings.App class.
However, it seems that the display_alerts doesn't successfully suppress this alert in xlwings, try this:
import xlwings as xw
app = xw.App(add_book=False)
app.display_alerts = False
wb = app.books.api.Open(fullpath, UpdateLinks=False)
I believe there is an implementation in xlwings to avoid update links messages now. I was able to bypass these alerts by adding the following
app.books.open(fname, update_links=False, read_only=True, ignore_read_only_recommended=True)
You can see these arguments available in the documentation xlwings.Book.open(...)
I presently have 20+ source workbooks that I loop though to extract some rows of data. It was intolerable to respond to the update links prompt of each opened workbook. I tried the other solutions here but none worked for me. After reviewing the cited xlwings docs, this is the solution that worked for me:
for fname in workbook_list:
wb = xw.books.open(fname, update_links = False)
# Extract some data...
wb.close()
My environment is Win10Pro / Python 3.8.1 / pywin32 version: 303 / Excel 365 Subscription / xlwings 0.26.2

Python Jupyter Notebook - Unable to open CSV file through a path

I am a little new to Python, and I have been using the Jupyter Notebook through Anaconda. I am trying to import a csv file to make a DataFrame, but I am unable to import the file.
Here is an attempt using the local method:
df = pd.read_csv('Workbook1')
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-11-a2deb4e316ab> in <module>()
----> 1 df = pd.read_csv('Workbook1')
After that I tried using the path (I put user for my username)
df = pd.read_csv('Users/user/Desktop/Workbook1.csv')
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-13-3f2bedd6c4de> in <module>()
----> 1 df = pd.read_csv('Users/user/Desktop/Workbook1.csv')
I am using a Mac, which I am also new to, and I am not 100% sure if I am correctly importing the right path. Can anyone offer some insight or solutions that would allow me to open this csv file.
Instead of providing path, you can set a path using the code below:
import os
import pandas as pd
os.chdir("D:/dataset")
data = pd.read_csv("workbook1.csv")
This will surely work.
Are you sure that the file exists in the location you are specifying to the pandas read_csv method? You can check using the os python built in module:
import os
os.path.isfile('/Users/user/Desktop/Workbook1.csv')
Another way of checking if the file of interest is in the current working directory within a Jupyter notebook is by running ls -l within a cell:
ls -l
I think the problem is probably in the location of the file:
df1 = pd.read_csv('C:/Users/owner/Desktop/contacts.csv')
Having done that, now you can play around with the big file if you have, and create useful data with:
df1.head()
The OS module in python provides functions for interacting with the operating system. OS, comes under Python’s standard utility modules.
import os
import pandas as pd
os.chdir("c:\Pandas")
df=pd.read_csv("names.csv")
df
This might help. :)
The file name is case sensitive, so check your case.
I had the same problem on a Mac too and for some reason it only happened to me there. And I tried to use many tricks but nothing works. I recommend you go directly to the file, right click and then press “alt” key after that the option to “copy route” will appear, and just paste it into your jupyter. For some reason that worked to me.
I believe the issue is that you're not using fully qualified paths. Try this:
Move the data into a suitable project directory. You can do this using the %%bash Magic commands.
%%bash
mkdir -p /project/data/
cp data.csv /project/data/data.csv
You can read the file
f = open("/project/data/data.csv","r")
print(f.read())
f.close()
But it might be most useful to load it into a library.
import pandas as pd
data = pd.read_csv("/project/data/data.csv")
I’ve created a runnable Jupyter notebook with more details here: Jupyter Basics: Reading Files.
Try double quotes, instead of single quotes. it worked for me.
you can open csv files in Jupyter notebook by following these easy steps-
Step 1 - create a directory or folder (you can also use old created folder)
Step 2 - Change your Jupyter working directory to that created directory -
import os
os.chdir('D:/datascience/csvfiles')
Step 3 - Now your directory is changed in Jupyter Notebook. Store your file(s) in that directory.
Step 4 - Open your file -
import pandas as pd
df = pd.read_csv("workbook1.csv")
Now your file is read and stored in a Data Frame variable df, you can display this file content by following
df.head() - display first five rows of this file
df - display all rows of this file
Happy Data Science!
There was a similar problem for me while reading a CSV file in Jupyter notebook from the computer.
I solved it by substituting the "" symbol with "/" in the path like this.
This is what I had:
"C:\Users\RAJ\Desktop\HRPrediction\HRprediction.csv"
This is what I changed it for:
"C:/Users/RAJ/Desktop/HRPrediction/HRprediction.csv".
This is what worked for me. I am using Mac OS.
Save your CSV on a separate folder on your desktop.
When opening a Jupyter notebook press on the same folder that your dataset is currently saved in. Press new notebook in the upper right hand corner.
After opening a new notebook. Code as per usual and read your data using import pandas as pd and pd.read_csv calling to your dataset.
 No need to use anything extra just use r in front of the location.
df = pd.read_csv(r'C:/Users/owner/Desktop/contacts.csv'

Categories