Basic ExcelFile read not working

Basic ExcelFile read not working - python

I have just downloaded Python3.5 and have been trying to do a simple task (open an Excel file and remove the first three rows and various columns from the file) for several hours now with no success.
The lastest issue is happening when I try to open the file.
This is the only code:
import pandas as pd
df = pd.ExcelFile("January2016.xlsx")
I get the following error no matter what read option I use with pandas.
Traceback (most recent call last):
File "C:\Python35\lib\site-packages\IPython\core\interactiveshell.py", line 2847, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-25-503f922e97e7>", line 1, in <module>
df = pd.ExcelFile("January2016.xlsx")
File "C:\Python35\lib\site-packages\pandas\io\excel.py", line 257, in __init__
self.book = xlrd.open_workbook(io)
File "C:\Python35\lib\site-packages\xlrd\__init__.py", line 422, in open_workbook
ragged_rows=ragged_rows,
File "C:\Python35\lib\site-packages\xlrd\xlsx.py", line 833, in open_workbook_2007_xml
x12sheet.process_stream(zflo, heading)
File "C:\Python35\lib\site-packages\xlrd\xlsx.py", line 548, in own_process_stream
self_do_row(elem)
File "C:\Python35\lib\site-packages\xlrd\xlsx.py", line 745, in do_row
value = error_code_from_text[tvalue]
KeyError: None
Please help!

You need to either use pd.ExcelFile.parse(...) or pd.read_excel. pd.ExcelFile isn't a method that parses excel files, you need the parse part, too.

I know this thread is 3 years old. Anyway hope this might help someone.
I came across this recently. I think the problem is caused by a image in the file. When I run the code in windows environment it works well. But when I run the code in ubuntu environment it gives a Traceback error.

Related

Python dataframe_image results in SyntaxError: not a PNG file

Good day. I used to have a normally working code which exports styled dataframe as a PNG. For some reason now it doesn't work except for certain machines used by my fellow coworkers. I suspect iit is somehow relevant to the latest windows or Chrome updates but I am not sure.
Sample code:
import numpy as np
import pandas as pd
import dataframe_image as dfi
my_array = np.array([[11,22,33],[44,55,66]])
df = pd.DataFrame(my_array, columns = ['Column_A','Column_B','Column_C'])
df=df.style.set_properties(**{'background-color': 'black',
'color': 'white'})
display(df)
dfi.export(df, 'Test.png', table_conversion='chrome')
Received error:
Traceback (most recent call last):
File "C:\Users\Anato\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3457, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 13, in
dfi.export(df, 'Test.png', table_conversion='chrome')
File "C:\Users\Anato\anaconda3\lib\site-packages\dataframe_image_pandas_accessor.py", line 24, in export
dpi=None
File "C:\Users\Anato\anaconda3\lib\site-packages\dataframe_image_pandas_accessor.py", line 73, in _export
File "C:\Users\Anato\anaconda3\lib\site-packages\dataframe_image_screenshot.py", line 167, in run
max_crop = int(img.shape[1] * 0.15)
File "", line 40, in take_screenshot_override
img = mimage.imread(buffer)
File "C:\Users\Anato\anaconda3\lib\site-packages\matplotlib\image.py", line 1541, in imread
with img_open(fname) as image:
File "C:\Users\Anato\anaconda3\lib\site-packages\PIL\ImageFile.py", line 121, in init
self._open()
File "C:\Users\Anato\anaconda3\lib\site-packages\PIL\PngImagePlugin.py", line 677, in _open
raise SyntaxError("not a PNG file")
File "", line unknown
SyntaxError: not a PNG file
Searched the web and found no answer that could help. Tried udating packages and python itself. I believe it has to do with latest system updates but found no solution for over a week.

In my case the following worked:
Update windows to the latest version
Update conda

Unable to read_excel using pandas on CentOS Stream 9 VM: zipfile.BadZipFile: Bad magic number for file header

I've been running a script for several months now where I read and concat several excel exports using the following code:
files = os.listdir(os.path.abspath('exports/'))
for file in files:
if file.startswith('ap_statistics_') and file.endswith('.xlsx'):
excel_list.append(pd.read_excel('exports/' + file, sheet_name='Access Points'))
df = pd.concat(excel_list, axis=0, ignore_index=True)
This has worked just fine until this Saturday when I uploaded new exports to the CentOS Stream 9 VM where I have a cronjob running the script every hour.
Now I always get this error:
Traceback (most recent call last):
File "/root/projects/beacon_check_v8/main.py", line 310, in <module>
ap_check()
File "/root/projects/beacon_check_v8/main.py", line 260, in ap_check
siteaps_result = getaps()
File "/root/projects/beacon_check_v8/main.py", line 30, in getaps
excel_list.append(pd.read_excel('exports/' + file, sheet_name='Access Points'))
File "/root/projects/beacon_check_v8/venv/lib64/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/root/projects/beacon_check_v8/venv/lib64/python3.9/site-packages/pandas/io/excel/_base.py", line 457, in read_excel
io = ExcelFile(io, storage_options=storage_options, engine=engine)
File "/root/projects/beacon_check_v8/venv/lib64/python3.9/site-packages/pandas/io/excel/_base.py", line 1419, in __init__
self._reader = self._engines[engine](self._io, storage_options=storage_options)
File "/root/projects/beacon_check_v8/venv/lib64/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 525, in __init__
super().__init__(filepath_or_buffer, storage_options=storage_options)
File "/root/projects/beacon_check_v8/venv/lib64/python3.9/site-packages/pandas/io/excel/_base.py", line 518, in __init__
self.book = self.load_workbook(self.handles.handle)
File "/root/projects/beacon_check_v8/venv/lib64/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 536, in load_workbook
return load_workbook(
File "/root/projects/beacon_check_v8/venv/lib64/python3.9/site-packages/openpyxl/reader/excel.py", line 317, in load_workbook
reader.read()
File "/root/projects/beacon_check_v8/venv/lib64/python3.9/site-packages/openpyxl/reader/excel.py", line 277, in read
self.read_strings()
File "/root/projects/beacon_check_v8/venv/lib64/python3.9/site-packages/openpyxl/reader/excel.py", line 143, in read_strings
with self.archive.open(strings_path,) as src:
File "/usr/lib64/python3.9/zipfile.py", line 1523, in open
raise BadZipFile("Bad magic number for file header")
zipfile.BadZipFile: Bad magic number for file header
I develop on my Windows 10 notebook using PyCharm with a Python 3.9 venv, same as on the VM, where the script continued to work just fine.
When researching online all I found was that sometimes .pyc files can cause issues so I created a completely new venv on the VM, installed all libraries (netmiko, pandas, openpyxl, etc.) and tried running the script again before and after deleting all .pyc files in the directory but no luck.
I have extracted the Excel file header using the following code:
with open('exports/' + file, 'rb') as myexcel:
print(myexcel.read(4))
Unfortunately it comes back as the same values on both my Windows venv as well as the CentOS venv:
b'PK\x03\x04'
I don't know if this header value is correct or not but I can read the files on my Windows notebook just fine using pandas or excel.
Any help would be greatly appreciated.

The issue was actually the program I used to transfer the files between my notebook and the VM, WinSCP. I don't know why or how this caused the error but I was able to fix it by transferring directly over pscp.

Pandas and xlrd error while reading excel files

I've been working on a Python script that deals with creating Pandas data frames from Excel files. For the past few days, the Pandas method worked perfectly with the usual pd.read_excel() method.
Today I've been trying to run the same code, but am running into errors. I've tried using the following code on a small test document (just two columns, 5 rows with simple integers):
import pandas as pd
pd.read_excel("tstr.xlsx")
I'm getting this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\util\_decorators.py", line 296, in wrapper
return func(*args, **kwargs)
File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_base.py", line 304, in read_excel
io = ExcelFile(io, engine=engine)
File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_base.py", line 867, in __init__
self._reader = self._engines[engine](self._io)
File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_xlrd.py", line 22, in __init__
super().__init__(filepath_or_buffer)
File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_base.py", line 353, in __init__
self.book = self.load_workbook(filepath_or_buffer)
File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_xlrd.py", line 37, in load_workbook
return open_workbook(filepath_or_buffer)
File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\xlrd\__init__.py", line 130, in open_workbook
bk = xlsx.open_workbook_2007_xml(
File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\xlrd\xlsx.py", line 812, in open_workbook_2007_xml
x12book.process_stream(zflo, 'Workbook')
File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\xlrd\xlsx.py", line 266, in process_stream
for elem in self.tree.iter() if Element_has_iter else self.tree.getiterator():
AttributeError: 'ElementTree' object has no attribute 'getiterator'
I get the exact same issue when trying to load excel files with xlrd directly. I've tried with several different excel files, and all of my pip installations are up-to-date.
I haven't made any changes to my system since pd.read_excel was last working perfectly (I did reboot my system, but it didn't involve any updates). I'm using a Windows 10 machine, if that's relevant.
Has anyone else had this issue? Any advice on how to proceed?

There can be many different reasons that cause this error, but you should try add engine='xlrd' or other possible values (mostly "openpyxl"). It may solve your issue, as it depends more on the excel file rather then your code.
Also, try to add full path to the file instead of relative one.

openpyxl.utils.exceptions.InvalidFileException: openpyxl does not support the old .xls file format, please use xlrd to read this file, or convert it to the more recent .xlsx file format.
So for me the argument:
engine="xlrd" worked on .xls
engine="openpyxl" worked on .xlsx

This works for me
#Back to linux prompt and install openpyxl
pip install openpyxl
#Add engine='openpyxl' in the python argument
data = pd.read_excel(path, sheet_name='Sheet1', parse_dates=True, engine='openpyxl')

Uncompyle2 issue: array indices must be integers

I have an issue when I am trying to decompile a .pyc file.
The traceback is the following:
Traceback (most recent call last):
File "my.py", line 4, in <module>
uncompyle2.uncompyle_file("/home/user/Downloads/asd.pyc", fileobj)
File "/home/user/Desktop/uncompyle2/uncompyle2/__init__.py", line 130, in uncompyle_file
uncompyle(version, co, outstream, showasm, showast, deob)
File "/home/user/Desktop/uncompyle2/uncompyle2/__init__.py", line 93, in uncompyle
tokens, customize = scanner.disassemble(co, deob=deob)
File "/home/user/Desktop/uncompyle2/uncompyle2/Scanner.py", line 214, in disassemble
cf = self.find_jump_targets(code)
File "/home/user/Desktop/uncompyle2/uncompyle2/Scanner.py", line 926, in find_jump_targets
self.detect_structure(i, op)
File "/home/user/Desktop/uncompyle2/uncompyle2/Scanner.py", line 737, in detect_structure
if int(self.code[jmp]) == RETURN_VALUE:
**TypeError: array indices must be integers**
Any ideas about this ?
I'm using Python 2.7.6 on a Ubuntu machine.
The command that I'm running to have this work is the following:
uncompyle2 asd.pyc
//EDIT: As far as I can tell, this happens only on a specific file(asd.py). It works on other files.
Any workaround ?

The .pyc file that you're trying to decompile is probably obfuscated. It's not uncompyle2s job to also deofuscate the file.
Try something else like pyc2py. Maybe it will work.

xlrd error when opening Excel files with named ranges

I'm getting the following error message when attempting to open a workbook using xlrd 0.9.1 on Python 3.2.4. I tested to see what could be causing the issue and I've troubleshooted it to the spreadsheet having named ranges.
Traceback (most recent call last):
File "C:\Users\mandroid\Desktop\xltest.py", line 5, in <module>
book = open_workbook(pth)
File "C:\Python32\lib\site-packages\xlrd\__init__.py", line 416, in open_workbook
ragged_rows=ragged_rows,
File "C:\Python32\lib\site-packages\xlrd\xlsx.py", line 725, in open_workbook_2007_xml
x12book.process_stream(zflo, 'Workbook')
File "C:\Python32\lib\site-packages\xlrd\xlsx.py", line 251, in process_stream
meth(self, elem)
File "C:\Python32\lib\site-packages\xlrd\xlsx.py", line 346, in do_defined_names
self.do_defined_name(child)
File "C:\Python32\lib\site-packages\xlrd\xlsx.py", line 335, in do_defined_name
nobj.formula_text = cooked_text(self, elem)
File "C:\Python32\lib\site-packages\xlrd\xlsx.py", line 130, in cooked_text
return unicode(unescape(t))
TypeError: <lambda>() takes exactly 2 arguments (1 given)
From what I've read, it looks like xlrd has named range functionality, so I'm not sure what could be causing this. Any help is appreciated.

It's a bug in xlrd 0.9.1: https://github.com/python-excel/xlrd/issues/47
You can try 0.9.0, wait for 0.9.2, or apply the fix John Machin suggests in the report.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Basic ExcelFile read not working - python

You need to either use pd.ExcelFile.parse(...) or pd.read_excel. pd.ExcelFile isn't a method that parses excel files, you need the parse part, too.

I know this thread is 3 years old. Anyway hope this might help someone. I came across this recently. I think the problem is caused by a image in the file. When I run the code in windows environment it works well. But when I run the code in ubuntu environment it gives a Traceback error.

Related

Python dataframe_image results in SyntaxError: not a PNG file

Unable to read_excel using pandas on CentOS Stream 9 VM: zipfile.BadZipFile: Bad magic number for file header

Pandas and xlrd error while reading excel files

Uncompyle2 issue: array indices must be integers

xlrd error when opening Excel files with named ranges

Categories

Resources