xlrd cannot read xlsx file downloaded from email attachment - python

This is a very very strange issue. I have quite a large excel file (the contents of which I cannot discuss as it is sensitive data) that is a .xlsx and IS a valid excel file.
When I download it from my email and save it on my desktop and try to open the workbook using xlrd, xlrd throws an AssertionError and does not show me what went wrong.
When I open the file using my file browser, then save it (without making any changes), it works perfectly with xlrd.
Has anyone faced this issue before? I tried passing in various flags to the open_workbook function to no avail and I tried googling for the error. So far I haven't found anything.
The method I used was as follows
file = open('bigexcelfile.xlsx')
fileString = file.read()
wb = open_workbook(file_contents=filestring)
Please help! The error is as follows
Traceback (most recent call last):
File "./varify/samples/resources.py", line 354, in post
workbook = xlrd.open_workbook(file_contents=fileString)
File "/home/vagrant/varify-env/lib/python2.7/site-packages/xlrd/__init__.py", line 416, in open_workbook
ragged_rows=ragged_rows,
File "/home/vagrant/varify-env/lib/python2.7/site-packages/xlrd/xlsx.py", line 791, in open_workbook_2007_xml
x12sheet.process_stream(zflo, heading)
File "/home/vagrant/varify-env/lib/python2.7/site-packages/xlrd/xlsx.py", line 528, in own_process_stream
self_do_row(elem)
File "/home/vagrant/varify-env/lib/python2.7/site-packages/xlrd/xlsx.py", line 722, in do_row
assert tvalue is not None
AssertionError

rename or Save as your Excel file as .xls instead of .xlsx
Thank You

Use pyopenxl, not xlrd, for this format: https://openpyxl.readthedocs.org/en/latest/

Related

Pandas and xlrd error while reading excel files

I've been working on a Python script that deals with creating Pandas data frames from Excel files. For the past few days, the Pandas method worked perfectly with the usual pd.read_excel() method.
Today I've been trying to run the same code, but am running into errors. I've tried using the following code on a small test document (just two columns, 5 rows with simple integers):
import pandas as pd
pd.read_excel("tstr.xlsx")
I'm getting this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\util\_decorators.py", line 296, in wrapper
return func(*args, **kwargs)
File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_base.py", line 304, in read_excel
io = ExcelFile(io, engine=engine)
File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_base.py", line 867, in __init__
self._reader = self._engines[engine](self._io)
File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_xlrd.py", line 22, in __init__
super().__init__(filepath_or_buffer)
File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_base.py", line 353, in __init__
self.book = self.load_workbook(filepath_or_buffer)
File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_xlrd.py", line 37, in load_workbook
return open_workbook(filepath_or_buffer)
File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\xlrd\__init__.py", line 130, in open_workbook
bk = xlsx.open_workbook_2007_xml(
File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\xlrd\xlsx.py", line 812, in open_workbook_2007_xml
x12book.process_stream(zflo, 'Workbook')
File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\xlrd\xlsx.py", line 266, in process_stream
for elem in self.tree.iter() if Element_has_iter else self.tree.getiterator():
AttributeError: 'ElementTree' object has no attribute 'getiterator'
I get the exact same issue when trying to load excel files with xlrd directly. I've tried with several different excel files, and all of my pip installations are up-to-date.
I haven't made any changes to my system since pd.read_excel was last working perfectly (I did reboot my system, but it didn't involve any updates). I'm using a Windows 10 machine, if that's relevant.
Has anyone else had this issue? Any advice on how to proceed?
There can be many different reasons that cause this error, but you should try add engine='xlrd' or other possible values (mostly "openpyxl"). It may solve your issue, as it depends more on the excel file rather then your code.
Also, try to add full path to the file instead of relative one.
openpyxl.utils.exceptions.InvalidFileException: openpyxl does not support the old .xls file format, please use xlrd to read this file, or convert it to the more recent .xlsx file format.
So for me the argument:
engine="xlrd" worked on .xls
engine="openpyxl" worked on .xlsx
This works for me
#Back to linux prompt and install openpyxl
pip install openpyxl
#Add engine='openpyxl' in the python argument
data = pd.read_excel(path, sheet_name='Sheet1', parse_dates=True, engine='openpyxl')

Pandas cant open csv file :FileNotFoundError: [Errno 2] File xyz.csv does not exist:

import pandas as pd
df=pd.read_csv('Catalogue.csv')
print(df)
I downloaded my earthquake csv file. And pandas dont see the file. I use VS Code and Python 3.8.3 I added csv file in the same py file where I write my code.
Even if I used the same code (csv was in the same folder where my code file was) in Jupyter Notebook folder the result were the same.
I guess if it is excel pip instal xlrd is written. I did pip install python-csv but couldnt achieve installing. Is it needed though? Or do I need to fixe the csv file (commas or spaces)?
total result:
Traceback (most recent call last):
File "c:/Users/Fatma Elik/Documents/VS Code/BTK/CSVCSV.py", line 2, in <module>
df=pd.read_csv('Catalogue.csv')
File "C:\Users\Fatma Elik\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Users\Fatma Elik\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 448, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "C:\Users\Fatma Elik\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 880, in __init__
self._make_engine(self.engine)
File "C:\Users\Fatma Elik\AppData\Local\Programs\Python\Python38-32\lib\der(src, **kwds)
File "pandas\_libs\parsers.pyx", line 374, in pandas._libs.parsers.TextReader.__cinit__
File "pandas\_libs\parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] File Catalogue.csv does not exist: 'Catalogue.csv'
Thanks everyone!
Please try
import pandas as pd
df=pd.read_csv("c://Users//Fatma Elik//Documents//VS Code//BTK//Catalogue.csv")
print(df)
There are quite a few scenarios that this situation might occur. Perhaps I can offer a few common suggestions for you to try.
Case 1 - Location where you run Python
Your file path is correct with respect to the location of the .py file, but incorrect with respect to the location from which you call python.
For example, let's say CSVCSV.py is located in ~/script/, and Catalogue.csv is located in ~/script/.
If you run python script/CSVCSV.py from ~/ , you will get the FileNotFound error. However, if from ~/script/ you run python CSVCSV.py, it will work.
In your case specifically, are you perhaps running python from .../BTK or .../VS Code ? I might take a guess that you are running python c:/Users/Fatma Elik/Documents/VS Code/BTK/CSVCSV.py.
Case 2 - Try using full directory path
Have you tried df = pd.read_csv("C://Users//Fatma Elik//Documents//VS Code//BTK//Catalogue.csv") ?
This situation usually occurs when you try to write a file in a particular directory, but the directory is not available. Let's say, you are trying to write records in data/hist/sign_seqs.csv but hist directory is not present.

Unable to open .xls file in Python, xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '<?xml ve'

I am trying to extract data from an excel file and unfortunately the xlrd library documentary is not opening up the file and it is throwing a lot of errors. For reference, I have a .xls file with 10+ pages of data
I've tried to use the xlrd library with no luck, here is my code:
import xlrd
file = "C:\TestAutomation\doc\Smart_CID.xls"
wb = xlrd.open_workbook(file)
print(wb.nsheets)
Here is the Traceback:
Traceback (most recent call last):
File "C:/TestAutomation/src/XML_parser.py", line 7, in <module>
wb = xlrd.open_workbook(file)
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 157, in open_workbook
ragged_rows=ragged_rows,
File "C:\Python27\lib\site-packages\xlrd\book.py", line 92, in open_workbook_xls
biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
File "C:\Python27\lib\site-packages\xlrd\book.py", line 1278, in getbof
bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
File "C:\Python27\lib\site-packages\xlrd\book.py", line 1272, in bof_error
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '<?xml ve'
For other reference, this excel file gets generated from a company website and it creates an excel file as well as an xml file. How can I open this excel file ?
I believe the file might not in proper xls form. Open the file in notepad to check. The "?xml ve" indicates that.
See this post for a similar situation.
Your file is probably really a .xml file, which is not supported by xlrd.

Faild to save a xlxs file twice with comments using openpyxl

This is my codes. When I try to save a xlxs with comments, It failed. How can I know when to save again.
from openpyxl import load_workbook
import datetime
filename = u"large_table.xlsx"
model = load_workbook(filename)
model.properties.lastPrinted = datetime.datetime.now()
model.save(filename)
model.properties.lastPrinted = datetime.datetime.now()
model.save(filename)
Traceback: It seems that self.workbook.vba_archive is set to None unexpectedly.
Traceback (most recent call last):
File "D:/h32workspace/trunk/event_editor/eric6/model/test_file.py", line 31, in <module>
model.save(filename)
File "C:\Python27\lib\site-packages\openpyxl\workbook\workbook.py", line 342, in save
save_workbook(self, filename)
File "C:\Python27\lib\site-packages\openpyxl\writer\excel.py", line 269, in save_workbook
writer.save(filename)
File "C:\Python27\lib\site-packages\openpyxl\writer\excel.py", line 251, in save
self.write_data()
File "C:\Python27\lib\site-packages\openpyxl\writer\excel.py", line 81, in write_data
self._write_worksheets()
File "C:\Python27\lib\site-packages\openpyxl\writer\excel.py", line 214, in _write_worksheets
self._write_comment(ws)
File "C:\Python27\lib\site-packages\openpyxl\writer\excel.py", line 184, in _write_comment
vml = fromstring(self.workbook.vba_archive.read(ws.legacy_drawing))
AttributeError: 'NoneType' object has no attribute 'read'
I tried to use keep_vba=True to load workbook, but if failed to save file correctly. The saved file can not be opened.
I used your code to save a sample .xlsx file. It saved without any issues.
Do you have any macro within you .xlsx file?
If yes, you may want to open the xlsx file with macro enabled using
model = load_workbook(filename, keep_vba=True)
See here for details on openpyxl usage with macro.
Also, try to save to a different filename than trying to overwrite original to make sure it works correctly.
fileout = "test2.xlsx"
model.save(fileout)
Hope this helps.

Python: 'NoneType' object has no attribute 'decompressobj'

I'm using Python 2.7.11 on Ubuntu.
I'm trying to open an Excel file (.xlsx) in Python using xlrd package. However I get the following error when I try to use the open_workbook() function from the package to open my Excel file:
Traceback (most recent call last):
File "TileInserter.py", line 15, in <module>
book = open_workbook(sheetPath, on_demand=True)
File "/usr/local/lib/python2.7/site-packages/xlrd/__init__.py", line 422, in open_workbook
ragged_rows=ragged_rows,
File "/usr/local/lib/python2.7/site-packages/xlrd/xlsx.py", line 761, in open_workbook_2007_xml
zflo = zf.open(component_names['xl/_rels/workbook.xml.rels'])
File "/usr/local/lib/python2.7/zipfile.py", line 1010, in open
close_fileobj=should_close)
File "/usr/local/lib/python2.7/zipfile.py", line 526, in __init__
self._decompressor = zlib.decompressobj(-15)
AttributeError: 'NoneType' object has no attribute 'decompressobj'
I tried to google the cause of this error and found that this could happen if the zlib library is not installed. But when I checked using PHP's phpinfo() function, it shows that zlib is installed. And that too the latest version (version 1.2.8).
So I'm kinda stuck now. Does anyone know how to solve this issue?
EDIT: My actual code in TileInserter.py goes like this (TileInserter.py and TileList.xlsx being in the same directory):
from xlrd import open_workbook
sheetPath = "TileList.xlsx"
#some more variables
#Open Excel file
book = open_workbook(sheetPath, on_demand=True)
for name in book.sheet_names():
if name.endswith('1'):
sheet = book.sheet_by_name(name)
I see on http://www.python-excel.org/ that there's a library openpyxl that is recommended for working with .xlsx files. This may be what you need instead of xlrd.

Categories