I have tons of ".xls" format excel files that haves #NAME error in it.
I need to open each one, collect data from a specific range. but when I try to open it with xlrd I get the following error: "ERROR *** Token 0x2d (AreaN) found in NAME formula"
Code is below:
import xlrd
book = xlrd.open_workbook(r"C:\Users\metin.unlu\Desktop\Python\renk_study\labs\2-19.xls",ignore_workbook_corruption=True)
sheet=book.sheet_by_index(0)
While the error is safe explatornary and I know the cause of it, I have no idea how to solve it without openning each excel file manually and fixing it.
Related
I am trying to learn Python (day 2) and am hoping to practice with Excel books first as this is where I am comfortable/fluent.
Right off the bat I am having an error that I don't quit understand when running the below code:
import openpyxl
wb = openpyxl.load_workbook("/Users/Scott/Desktop/Workbook1.xlsx")
print(wb.sheetnames)
This does print my sheet names as requested, but it is followed by:
/Users/Scott/PycharmProjects/Excel/venv/lib/python3.7/site-packages/openpyxl/worksheet/_reader.py:293: UserWarning: Unknown extension is not supported and will be removed
warn(msg)
I have found other questions that point to slicers/conditional formatting etc, but that does not apply here. This is a book I just made and only added 3 sheets before saving. It has no data, no formatting, and the extension is valid. I have no add-ons installed on my excel either.
Any idea why why I am getting this error? How do I resolve?
Python: 3.7
openpyxl: 2.6
I had a similar issue. I developed an application which read and write Excel files. It woked well on my Windows computer, but then I tried to run it on a friends mac. It showed the same error. I could "fix" it by changing the configuration of the workbook, like this:
import openpyxl as op
wb = op.load_workbook(file, read_only=True, data_only=True)
But, as you can see, you can only read Excel files with this configuration. At the end, I realized that my friend didn't have Microsoft Office installed on his computer. Install it truly solved my problem.
This question was from a couple years ago but I'm encountering it now with openpyxl and require a fix, as the warning is confounding and misleading to my end users.
The warning from openpyxl comes via the stdlib warnings library, which can be suppressed.
import warnings
warnings.simplefilter("ignore")
That's the "hit it with a hammer" approach. More granular levels of warnings suppression can be found here: https://docs.python.org/3/library/warnings.html
This is exactly the problem I encountered just now..
And to my situation (not to everyone) I discovered that you just need to close your excel and rerun the code, very simple.
If this doesn't work, you can refer to other answers.
Thanks
Python - Openpyxl - "UserWarning: Unknown extension" issue
To understand the error, you need to know what's inside an XLSX file. The best way to take a look is to change the extension to zip and open that. Inside you will see a file called [Content_Types].xml and directories for the other content. If you check out the XML in Content_Types you will see a <Types ...> tag containing other tags like this:
<Default Extension="png" ContentType="image/png"/>
<Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
<Default Extension="xml" ContentType="application/xml"/>
Note the "Extension" property. This is what the warning refers to. In the example above, my file included Extension="png" - the unknown extension.
For me, it was enough to specify read_only=True and the error went away eg:
wb = openpyxl.load_workbook(file, read_only=True)
I could also fix the issue by copying everything except the images to a new workbook and saving that. After checking, the xml in the new workbook no longer contained the png property.
Note, reading into pandas with pd.read_excel uses openpyxl and generates the same "Unknown extension" error but there is no way to pass through the read_only parameter. You can suppress the specific warning with:
import warnings
warnings.filterwarnings('ignore', category=UserWarning, module='openpyxl')
Overview:
I have a function in an external file that returns a dataframe that I then save to a csv with:
df.to_csv('filepath.csv', na_rep="0", index=False)
I then try to import the csv into a postgres table using the pyscopg2 function copy_from:
try:
connect = psycopg2.connect(database = "", user = "", password = "", host = "", port = "")
except:
print("Could not connect to database")
cursor = connect.cursor()
with open("filepath", 'r') as open_csv:
next(open_csv)
try:
cursor.copy_from(open_csv, sep=",")
connect.commit()
print("Copy Complete")
except:
print("Copy Error")
cursor.close()
This results in a copy error exception in the code above (so no real detail) but there are some weird caveats:
For some reason, if I open the csv in libre office and manually save it as a text csv and then run just the above psycopg2 copy_from process, the copy works and there are no issues. So for whatever reason, in the eyes of psycopg2 copy_from, something is off with the to.csv() write that gets fixed if I just manually save the file. Manually saving the csv does not result in any visual changes so what is happening here?
Also, the above psycopg2 code snippet works without error in another file in which all dataframe manipulation is contained within the single file where the to.csv() is completed. So something about returning a dataframe from a function in an external file is off?
Fwiw, when debugging, the issue came up on the .copy_from() line so the issue has something to do with csv formatting and I cannot figure it out. I found a workaround with sqlalchemy but would like to know what I am missing here instead of ignoring the problem.
In the postgres error log, the error is: "invalid input syntax for type integer: "-1.0". This error is occurring in the last column of my table where the value is set as an INT and in the csv, the value is -1 but it is being interpreted as -1.0. Where I am confused is that if I use a COPY query to directly input the csv file into postgres, it does not have a problem. Why does it interpret the value as -1.0 through psycopg2 but not directly in postgres?
This is my first post so if more detail is needed let me know - thanks in advance for the help
I'm having issues when I try to write a file in Python giving it a certain name and not only pathing it. Here is what I'm trying to do:
page_title=page.find('title')
raw_data_path ='output/'+page_title+'_raw.txt'
print (page_title)
with open(raw_data_path, 'w') as file:
file.write ()
When running this code, I receive the error [Errno 22] Invalid argument: 'output/myfile_raw.txt'
If instead of raw_data_path ='output/'+page_title+'_raw.txt' I put, for example, raw_data_path ='output/'+'_raw.txt' the code works well, so for some reason I can't combine the path with the name I'm trying to give the file.
I've searched that error and I see that it is a routing error, so it might be happening because when I want to add the page_title something happens with the path, but I can't see which is the mistake because it should be working.
Can someone give me some help with this issue?
I am working with a 150GB+ zipfile of dicom images.
I am trying to extract some of these by their filenames.
I am working on google collab python 3 interpreter and use zipfile module along with ZipFile.extractall() method on a list of filenames of length 500 (ex : ['stage_1_train_images/ID_53ff71bc4.dcm', 'stage_1_train_images/ID_001bb2c00.dcm',etc...]) :
.
Here's my code :
from zipfile import ZipFile
with ZipFile(src, 'r') as zipObj:
zipObj.extractall(members = ids, path = '/content/drive/My Drive/RSNA IH DETECTION CHALLENGE/DICOM') #ids is my file list
I got an error message :
"OSError: [Errno 5] Input/output error", related to a message in read(self, n) : "727 "Close the writing handle before trying to read.")"
I tried to close the file and re-open it, tried to extract with the .extract() method several times and always got the same err message.
I first attempted to use fuse zip along with shutil.copyfile() but it failed to...
Do you know what's causing this error message and a possible way to fix it?
I fixed this problem with re-downloading the original file.
My version, stored on Google Drive was probably corrupted.
Now fuse-zip works as well as ZipFile methods !
I have the following function that will read from an excel workbook with the openpyxl library:
import openpyxl
def read_excel(path):
excel_workbook = openpyxl.load_workbook(path, read_only = True)
# other logic
return None
I can call that function like this:
read_excel("C:/Users/anon/Desktop/Current Projects/Test Files/Test.xlsm ")
And it returns this error:
openpyxl.utils.exceptions.InvalidFileException: openpyxl does not support .xlsm file
format, please check you can open it with Excel first. Supported formats are: .xlsx,.xlsm,
.xltx,.xltm
That error message confuses me. It's telling me that it doesn't support the .xlsm file format, and that it supports the .xlsm file format. The file opens just fine in excel, why won't openpyxl read my Excel file?
There is an extra whitespace character in the error message after .xlsm. Remove the whitespace character at the end of the path string you call the function with, and the function runs without error.
read_excel("C:/Users/anon/Desktop/Current Projects/Test Files/Test.xlsm")
the same problem bothered me a lot also today, and finally I updated openpyxl from 2.3.2 to 2.3.5, and this problem disappeared.
Although I am using Anaconda, sometimes using pip to update the packages might be a good try.
I'm using PyQt5 and had the same problem. I found that adding _filter fixed the problem. The full line reads:
fileName, _filter = QtWidgets.QFileDialog.getOpenFileName(None, "Lists", "", "xlsx files *.xlsx")
First, change the cwd(). When passing the file name, you can just copy the name of the file and paste it instead of typing it manually. The error may arise from some undetected nuances.