xlrd error when opening Excel files with named ranges

xlrd error when opening Excel files with named ranges - python

I'm getting the following error message when attempting to open a workbook using xlrd 0.9.1 on Python 3.2.4. I tested to see what could be causing the issue and I've troubleshooted it to the spreadsheet having named ranges.
Traceback (most recent call last):
File "C:\Users\mandroid\Desktop\xltest.py", line 5, in <module>
book = open_workbook(pth)
File "C:\Python32\lib\site-packages\xlrd\__init__.py", line 416, in open_workbook
ragged_rows=ragged_rows,
File "C:\Python32\lib\site-packages\xlrd\xlsx.py", line 725, in open_workbook_2007_xml
x12book.process_stream(zflo, 'Workbook')
File "C:\Python32\lib\site-packages\xlrd\xlsx.py", line 251, in process_stream
meth(self, elem)
File "C:\Python32\lib\site-packages\xlrd\xlsx.py", line 346, in do_defined_names
self.do_defined_name(child)
File "C:\Python32\lib\site-packages\xlrd\xlsx.py", line 335, in do_defined_name
nobj.formula_text = cooked_text(self, elem)
File "C:\Python32\lib\site-packages\xlrd\xlsx.py", line 130, in cooked_text
return unicode(unescape(t))
TypeError: <lambda>() takes exactly 2 arguments (1 given)
From what I've read, it looks like xlrd has named range functionality, so I'm not sure what could be causing this. Any help is appreciated.

It's a bug in xlrd 0.9.1: https://github.com/python-excel/xlrd/issues/47
You can try 0.9.0, wait for 0.9.2, or apply the fix John Machin suggests in the report.

Related

Pyspark databricks connect library issue

I am currently coding pyspark pipelines using databricks connect library. The steps I followed are given here. This library has been installed in a virtual environment.
When I try to execute this code
spark.read.load(path).first()
I get this error
<class 'TypeError'>, 'JavaPackage' object is not callable, <traceback object at 0x0000017AB70ECF88>
Traceback (most recent call last):
File "D:/Friendsurance/Repository/data-ingestion/job/main.py", line 83, in <module>
run()
File "D:/Friendsurance/Repository/data-ingestion/job/main.py", line 79, in run
el_job.run()
File "D:\Friendsurance\Repository\data-ingestion\job\task\__init__.py", line 18, in run
data: DataFrame = self.extract()
File "D:\Friendsurance\Repository\data-ingestion\job\task\ELTask.py", line 14, in extract
return self.extractor.extract()
File "D:\Friendsurance\Repository\data-ingestion\job\task\extractor\BucketExtractor.py", line 26, in extract
self.spark, self.load_storage.get_path(), self.conf.partition_column
File "D:\Friendsurance\Repository\data-ingestion\job\task\extractor\__init__.py", line 14, in calculate_last_day_run
spark.read.load(path).first().show()
File "D:\Friendsurance\Repository\data-ingestion\venv\lib\site-packages\pyspark\sql\dataframe.py", line 1381, in first
return self.head()
File "D:\Friendsurance\Repository\data-ingestion\venv\lib\site-packages\pyspark\sql\dataframe.py", line 1369, in head
rs = self.head(1)
File "D:\Friendsurance\Repository\data-ingestion\venv\lib\site-packages\pyspark\sql\dataframe.py", line 1371, in head
return self.take(n)
File "D:\Friendsurance\Repository\data-ingestion\venv\lib\site-packages\pyspark\sql\dataframe.py", line 657, in take
return self.limit(num).collect()
File "D:\Friendsurance\Repository\data-ingestion\venv\lib\site-packages\pyspark\sql\dataframe.py", line 596, in collect
if self._sc._conf.get(self._sc._jvm.PythonSecurityUtils.USE_FILE_BASED_COLLECT()):
TypeError: 'JavaPackage' object is not callable
But when I am out of the virtual environment where I am using the pyspark library provided here, I am able to execute the same line and it gives me the output.
Can anyone please tell me where I am going wrong with this?

Pandas and glob: convert all xlsx files in folder to csv – TypeError: init() got an unexpected keyword argument 'xfid'

I have a folder with many xlsx files that I'd like to convert to csv files.
During my research, if found several threads about this topic, such as this or that one. Based on this, I formulated the following code using glob and pandas:
import glob
import pandas as pd
path = r'/Users/.../xlsx files'
excel_files = glob.glob(path + '/*.xlsx')
for excel in excel_files:
out = excel.split('.')[0]+'.csv'
df = pd.read_excel(excel) # error occurs here
df.to_csv(out)
But unfortunately, I got the following error message that I could not interpret in this context and I could not figure out how to solve this problem:
Traceback (most recent call last):
File "<input>", line 11, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/util/_decorators.py", line 299, in wrapper
return func(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 336, in read_excel
io = ExcelFile(io, storage_options=storage_options, engine=engine)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 1131, in __init__
self._reader = self._engines[engine](self._io, storage_options=storage_options)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 475, in __init__
super().__init__(filepath_or_buffer, storage_options=storage_options)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 391, in __init__
self.book = self.load_workbook(self.handles.handle)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 486, in load_workbook
return load_workbook(
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 317, in load_workbook
reader.read()
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 281, in read
apply_stylesheet(self.archive, self.wb)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py", line 198, in apply_stylesheet
stylesheet = Stylesheet.from_tree(node)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py", line 103, in from_tree
return super(Stylesheet, cls).from_tree(node)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree
obj = desc.expected_type.from_tree(el)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree
obj = desc.expected_type.from_tree(el)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 103, in from_tree
return cls(**attrib)
TypeError: __init__() got an unexpected keyword argument 'xfid'
Does anyone know how to fix this? Thanks a lot for your help!

I had the same problem here. After some hours thinking and searching I realized the problem is, actually, the file. I opened it using MS Excel, and save. Alakazan, problem solved.
The file was downloaded, so i think it's a "security" error or just an error from how the file was created. xD
EDIT:
It's not a security problem, but actually an error from the generation of file. The correct has the double of kb the wrong file.
An solution is: if using xlrd==1.2.0 the file can be opened, you can, after doing this, call read_excel to the Book(file opened by xlrd).
import xlrd
# df = pd.read_excel('TabelaPrecos.xlsx')
# The line above is the same result
a = xlrd.open_workbook('TabelaPrecos.xlsx')
b = pd.read_excel(a)

How to use dates in the yahoo_fin Python Package

I recently installed yahoo_fin and I tired the following example:
get_calls('NFLX' ')
It worked. I then tired the following:
get_calls('NFLX', '11/8/2019')
It failed. Here is what I got:
get_calls('NFLX', '11/8/2019')
Traceback (most recent call last):
File "", line 1, in
get_calls('NFLX', '11/8/2019')
File "C:\Users\rsher\Anaconda3\lib\site-packages\yahoo_fin\options.py", line 48, in get_calls
options_chain = get_options_chain(ticker, date)
File "C:\Users\rsher\Anaconda3\lib\site-packages\yahoo_fin\options.py", line 32, in get_options_chain
tables = pd.read_html(site)
File "C:\Users\rsher\Anaconda3\lib\site-packages\pandas\io\html.py", line 906, in read_html
keep_default_na=keep_default_na)
File "C:\Users\rsher\Anaconda3\lib\site-packages\pandas\io\html.py", line 743, in _parse
raise_with_traceback(retained)
File "C:\Users\rsher\Anaconda3\lib\site-packages\pandas\compat__init__.py", line 344, in raise_with_traceback
raise exc.with_traceback(traceback)
ValueError: No tables found
I am using version 3.6.3 of Python and I am also using Spyder.
Am I doing something wrong? Do you think I have found a bug?
I updated my version of yahoo_fin. Not really sure it was out of date. I now get the following error messages when I run the command: get_calls("nflx", "1/31/20")
Traceback (most recent call last):
File "", line 1, in
get_calls("nflx", "1/31/20")
File "C:\Users\rsher\Anaconda3\lib\site-packages\yahoo_fin\options.py", line 48, in get_calls
options_chain = get_options_chain(ticker, date)
File "C:\Users\rsher\Anaconda3\lib\site-packages\yahoo_fin\options.py", line 32, in get_options_chain
tables = pd.read_html(site)
File "C:\Users\rsher\Anaconda3\lib\site-packages\pandas\io\html.py", line 906, in read_html
keep_default_na=keep_default_na)
File "C:\Users\rsher\Anaconda3\lib\site-packages\pandas\io\html.py", line 743, in _parse
raise_with_traceback(retained)
File "C:\Users\rsher\Anaconda3\lib\site-packages\pandas\compat__init__.py", line 344, in raise_with_traceback
raise exc.with_traceback(traceback)
ValueError: No tables found

It should work the way you have it.
from yahoo_fin.options import get_calls
get_calls("nflx", "1/31/20")
Are you using the most recent version of yahoo_fin? It should be (as of this writing) version 0.8.4. Another possible issue is that there could have been a problem with Yahoo Finance's page for that option chain at that particular time.

Basic ExcelFile read not working

I have just downloaded Python3.5 and have been trying to do a simple task (open an Excel file and remove the first three rows and various columns from the file) for several hours now with no success.
The lastest issue is happening when I try to open the file.
This is the only code:
import pandas as pd
df = pd.ExcelFile("January2016.xlsx")
I get the following error no matter what read option I use with pandas.
Traceback (most recent call last):
File "C:\Python35\lib\site-packages\IPython\core\interactiveshell.py", line 2847, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-25-503f922e97e7>", line 1, in <module>
df = pd.ExcelFile("January2016.xlsx")
File "C:\Python35\lib\site-packages\pandas\io\excel.py", line 257, in __init__
self.book = xlrd.open_workbook(io)
File "C:\Python35\lib\site-packages\xlrd\__init__.py", line 422, in open_workbook
ragged_rows=ragged_rows,
File "C:\Python35\lib\site-packages\xlrd\xlsx.py", line 833, in open_workbook_2007_xml
x12sheet.process_stream(zflo, heading)
File "C:\Python35\lib\site-packages\xlrd\xlsx.py", line 548, in own_process_stream
self_do_row(elem)
File "C:\Python35\lib\site-packages\xlrd\xlsx.py", line 745, in do_row
value = error_code_from_text[tvalue]
KeyError: None
Please help!

You need to either use pd.ExcelFile.parse(...) or pd.read_excel. pd.ExcelFile isn't a method that parses excel files, you need the parse part, too.

I know this thread is 3 years old. Anyway hope this might help someone.
I came across this recently. I think the problem is caused by a image in the file. When I run the code in windows environment it works well. But when I run the code in ubuntu environment it gives a Traceback error.

OSError: Result too large

I'am playing around with scapy but i cant get it to work. I tried different code's but all gave me the same output:
Traceback (most recent call last):
File "<module1>", line 7, in <module>
File "C:\Python26\lib\site-packages\scapy\sendrecv.py", line 357, in srp
s = conf.L2socket(iface=iface, filter=filter, nofilter=nofilter, type=type)
File "C:\Python26\lib\site-packages\scapy\arch\pcapdnet.py", line 313, in __init__
self.outs = dnet.eth(iface)
File "dnet.pyx", line 112, in dnet.eth.__init__
OSError: Result too large
Iam using python 2.6 with all dependencies installed for scapy.
How to fix this?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

xlrd error when opening Excel files with named ranges - python

It's a bug in xlrd 0.9.1: https://github.com/python-excel/xlrd/issues/47 You can try 0.9.0, wait for 0.9.2, or apply the fix John Machin suggests in the report.

Related

Pyspark databricks connect library issue

Pandas and glob: convert all xlsx files in folder to csv – TypeError: init() got an unexpected keyword argument 'xfid'

How to use dates in the yahoo_fin Python Package

Basic ExcelFile read not working

OSError: Result too large

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

xlrd error when opening Excel files with named ranges - python

It's a bug in xlrd 0.9.1: https://github.com/python-excel/xlrd/issues/47 You can try 0.9.0, wait for 0.9.2, or apply the fix John Machin suggests in the report.

Related

Pyspark databricks connect library issue

Pandas and glob: convert all xlsx files in folder to csv – TypeError: __init__() got an unexpected keyword argument 'xfid'

How to use dates in the yahoo_fin Python Package

Basic ExcelFile read not working

OSError: Result too large

Categories

Resources

Pandas and glob: convert all xlsx files in folder to csv – TypeError: init() got an unexpected keyword argument 'xfid'