Problems when opening xlsx with openpyxl

Problems when opening xlsx with openpyxl - python

I have a xlsx file and I tried to load this file using openpyxl
from openpyxl import load_workbook
wb = load_workbook('/home/file_path/file.xlsx')
But I get this error:
"wb = load_workbook(new_file)"): expected string or buffer
new_file is a variable with the path of the xlsx file trying to open. Does anybody knows why this happens or how I should change to read the file? Thanks!
Update More details about the error
/home/vagrant/scrapy/local/lib/python2.7/site-packages/openpyxl/reader/worksheet.py:322: UserWarning: Unknown extension is not supported and will be removed
warn(msg)
/home/vagrant/scrapy/local/lib/python2.7/site-packages/openpyxl/reader/worksheet.py:322: UserWarning: Conditional Formatting extension is not supported and will be removed
warn(msg)
Traceback (most recent call last):
File "/vagrant/vagrant_conf/pycharm-debug.egg/pydevd_comm.py", line 1071, in doIt
result = pydevd_vars.evaluateExpression(self.thread_id, self.frame_id, self.expression, self.doExec)
File "/vagrant/vagrant_conf/pycharm-debug.egg/pydevd_vars.py", line 344, in evaluateExpression
Exec(expression, updated_globals, frame.f_locals)
File "/vagrant/vagrant_conf/pycharm-debug.egg/pydevd_exec.py", line 3, in Exec
exec exp in global_vars, local_vars
File "<string>", line 1, in <module>
File "/home/vagrant/scrapy/local/lib/python2.7/site-packages/openpyxl/reader/excel.py", line 252, in load_workbook
wb._named_ranges = list(read_named_ranges(archive.read(ARC_WORKBOOK), wb))
File "/home/vagrant/scrapy/local/lib/python2.7/site-packages/openpyxl/workbook/names/named_range.py", line 130, in read_named_ranges
if external_range(node_text):
File "/home/vagrant/scrapy/local/lib/python2.7/site-packages/openpyxl/workbook/names/named_range.py", line 112, in external_range
m = EXTERNAL_RE.match(range_string)
TypeError: expected string or buffer

The syntaxe is:
wb = load_workbook(filename='file.xlsx', read_only=True)
The read_only keyword is not required.

Related

Python openpyxl telling me " No such file or directory:" even though python file is in same directory as the excel file, among others

Beginner here.
I've tried making new excel files, running the python code while the excel file is open, tried renaming the excel file, but nothing works. I was just following this youtube tutorial: https://www.youtube.com/watch?v=7YS6YDQKFh0&t=14s
from openpyxl import Workbook, load_workbook
workbook = load_workbook("Test2.xlsx")
The error:
Traceback (most recent call last):
File "c:\Users\Name\Desktop\Python Projects\Excel Manipulation\Excel_Test.py", line 3, in <module>
workbook = load_workbook("Test2.xlsx")
File "C:\Users\Name\AppData\Local\Programs\Python\Python39\lib\site-packages\openpyxl\reader\excel.py", line 315, in load_workbook
reader = ExcelReader(filename, read_only, keep_vba,
File "C:\Users\Name\AppData\Local\Programs\Python\Python39\lib\site-packages\openpyxl\reader\excel.py", line 124, in __init__
self.archive = _validate_archive(fn)
File "C:\Users\Name\AppData\Local\Programs\Python\Python39\lib\site-packages\openpyxl\reader\excel.py", line 96, in _validate_archive
archive = ZipFile(filename, 'r')
File "C:\Users\Name\AppData\Local\Programs\Python\Python39\lib\zipfile.py", line 1239, in __init__
self.fp = io.open(file, filemode)
**FileNotFoundError: [Errno 2] No such file or directory: 'Test2.xlsx'
PS C:\Users\Name>**

I found a solution. Turns out you just need to copy the file location, convert it to raw string, and add \ExcelFile to the end:
from openpyxl import Workbook, load_workbook
location = r'C:\Users\Name\Desktop\Python Projects\Excel Manipulation\Test2.xlsx'
workbook = load_workbook(location)
Derp.

pandas reading excel results in "not a zip file"

I try to read a xlsx into a data frame:
itut_ir = pd.read_excel('C:\\Users\\Administrator\\Downloads\\reportdata.xlsx')
print(itut_ir.to_string())
I receive this:
Traceback (most recent call last):
File
"C:\Users\Administrator\eclipse-workspace\Reports\GOW\Report.py",
line 44, in
df = pd.read_excel('C:\Users\Administrator\Downloads\reportdata.xlsx')
File
"C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel_base.py",
line 304, in read_excel
io = ExcelFile(io, engine=engine) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel_base.py",
line 824, in init
self._reader = self.enginesengine File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel_xlrd.py",
line 21, in init
super().init(filepath_or_buffer) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel_base.py",
line 353, in init
self.book = self.load_workbook(filepath_or_buffer) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel_xlrd.py",
line 36, in load_workbook
return open_workbook(filepath_or_buffer) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\xlrd_init.py",
line 117, in open_workbook
zf = zipfile.ZipFile(filename) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\zipfile.py",
line 1222, in init
self._RealGetContents() File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\zipfile.py",
line 1289, in _RealGetContents
raise BadZipFile("File is not a zip file") zipfile.BadZipFile: File is not a zip file
does anybody have an idea? the file does not seem to be broken, I can open it with Excel.
thanks!
*** UPDATE ***
the file producing the error is being downloaded from FTP. opening the original file works ... if that gives you a hint :) thanks

I had the same issue just a little bit ago with an XLSX that I created in LibreOffice.
The solution was to check the XLSX to make sure it wasn't corrupted. In my case, loading a previous version of the XLSX file corrected the problem.

Using Pandas to read excel from url

I am working on a personal project to analyze COVID19 data. Presently, I am download the excel sheet provided by ourworldindata.org, available at this url -> https://github.com/owid/covid-19-data/blob/master/public/data/owid-covid-data.xlsx
However, when i try to execute the command in pandas (below), I get a list of errors. What could be the root cause ?
url = 'https://github.com/owid/covid-19-data/blob/master/public/data/owid-covid-data.xlsx'
df = pd.read_excel(url, sheet_name='Sheet1')
Error
Traceback (most recent call last): File "<input>", line 1, in <module> File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\pandas\io\excel\_base.py", line 304, in read_excel
io = ExcelFile(io, engine=engine) File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\pandas\io\excel\_base.py", line 824, in __init__
self._reader = self._engines[engine](self._io) File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\pandas\io\excel\_xlrd.py", line 21, in __init__
super().__init__(filepath_or_buffer) File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\pandas\io\excel\_base.py", line 351, in __init__
self.book = self.load_workbook(filepath_or_buffer) File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\pandas\io\excel\_xlrd.py", line 34, in load_workbook
return open_workbook(file_contents=data) File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\xlrd\__init__.py", line 157, in open_workbook
ragged_rows=ragged_rows, File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\xlrd\book.py", line 92, in open_workbook_xls
biff_version = bk.getbof(XL_WORKBOOK_GLOBALS) File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\xlrd\book.py", line 1278, in getbof
bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8]) File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\xlrd\book.py", line 1272, in bof_error
raise XLRDError('Unsupported format, or corrupt file: ' + msg) xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\n\n\n\n\n<!D'
Please not that pandas can read the excel if I download it on my computer

Try the link to raw excel file:
import pandas as pd
url='https://github.com/owid/covid-19-data/blob/master/public/data/owid-covid-data.xlsx?raw=true'
df=pd.read_excel(url, sheet_name='Sheet1')

You can do it with requests
import pandas as pd
import io
import requests
url = 'https://github.com/owid/covid-19-data/blob/master/public/data/owid-covid-data.xlsx'
get_content = requests.get(url).content
df = pd.read_csv(io.StringIO(get_content .decode('utf-8')))
I do this to avoid using local drive or google drive , and saves time of connection.

Why can't I edit my .xlsx file with openpyxl?

I am encountering a problem when running with openpyxl the code below
import openpyxl
import os
wb = openpyxl.load_workbook('example.xlsx')
sheet = wb.get_sheet_by_name('Sheet1')
sheet["A1"].value
sheet["A1"].value == None
sheet["A1"].value = 42
sheet["A3"].value = 'Hello'
os.chdir("/Users/mac/Desktop")
wb.save('exceeeel.xlsx')
The error is
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/openpyxl/reader/excel.py", line 312, in load_wo
rkbook
reader = ExcelReader(filename, read_only, keep_vba,
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/openpyxl/reader/excel.py", line 124, in __init_
_
self.archive = _validate_archive(fn)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/openpyxl/reader/excel.py", line 96, in _validat
e_archive
What am I doing wrong? I am using the current version of the openpyxl library.

I can't provide a confident answer because the question only includes a partial traceback. That being said, it looks like the traceback one would get for a FileNotFoundError:
C:\Python37\python.exe C:/Users/user/PycharmProjects/scratch/scratch2.py
Traceback (most recent call last):
File "C:/Users/user/PycharmProjects/scratch/scratch2.py", line 3, in <module>
wb = openpyxl.load_workbook('example.xlsx')
File "C:\Python37\lib\site-packages\openpyxl\reader\excel.py", line 313, in load_workbook
data_only, keep_links)
File "C:\Python37\lib\site-packages\openpyxl\reader\excel.py", line 124, in __init__
self.archive = _validate_archive(fn)
File "C:\Python37\lib\site-packages\openpyxl\reader\excel.py", line 96, in _validate_archive
archive = ZipFile(filename, 'r')
File "C:\Python37\lib\zipfile.py", line 1207, in __init__
self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: 'example.xlsx'
Process finished with exit code 1
This error will be raised when the path your provided to the file you want to load with openpyxl.load_workbook does not contain the specified file. Since the only argument you provided in your call of that function is 'example.xlsx' that probably means there is no file in the folder you are running this script from.
If this 'example.xlsx' file is in a different folder then you'll want to either specify the relative path to that file as your argument or move the file into the same folder as your script.
If this isn't what's going on then you'll need to provide the full traceback that you are seeing on your end in order to get a better answer.

Faild to save a xlxs file twice with comments using openpyxl

This is my codes. When I try to save a xlxs with comments, It failed. How can I know when to save again.
from openpyxl import load_workbook
import datetime
filename = u"large_table.xlsx"
model = load_workbook(filename)
model.properties.lastPrinted = datetime.datetime.now()
model.save(filename)
model.properties.lastPrinted = datetime.datetime.now()
model.save(filename)
Traceback: It seems that self.workbook.vba_archive is set to None unexpectedly.
Traceback (most recent call last):
File "D:/h32workspace/trunk/event_editor/eric6/model/test_file.py", line 31, in <module>
model.save(filename)
File "C:\Python27\lib\site-packages\openpyxl\workbook\workbook.py", line 342, in save
save_workbook(self, filename)
File "C:\Python27\lib\site-packages\openpyxl\writer\excel.py", line 269, in save_workbook
writer.save(filename)
File "C:\Python27\lib\site-packages\openpyxl\writer\excel.py", line 251, in save
self.write_data()
File "C:\Python27\lib\site-packages\openpyxl\writer\excel.py", line 81, in write_data
self._write_worksheets()
File "C:\Python27\lib\site-packages\openpyxl\writer\excel.py", line 214, in _write_worksheets
self._write_comment(ws)
File "C:\Python27\lib\site-packages\openpyxl\writer\excel.py", line 184, in _write_comment
vml = fromstring(self.workbook.vba_archive.read(ws.legacy_drawing))
AttributeError: 'NoneType' object has no attribute 'read'
I tried to use keep_vba=True to load workbook, but if failed to save file correctly. The saved file can not be opened.

I used your code to save a sample .xlsx file. It saved without any issues.
Do you have any macro within you .xlsx file?
If yes, you may want to open the xlsx file with macro enabled using
model = load_workbook(filename, keep_vba=True)
See here for details on openpyxl usage with macro.
Also, try to save to a different filename than trying to overwrite original to make sure it works correctly.
fileout = "test2.xlsx"
model.save(fileout)
Hope this helps.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Problems when opening xlsx with openpyxl - python

The syntaxe is: wb = load_workbook(filename='file.xlsx', read_only=True) The read_only keyword is not required.

Related

Python openpyxl telling me " No such file or directory:" even though python file is in same directory as the excel file, among others

pandas reading excel results in "not a zip file"

Using Pandas to read excel from url

Why can't I edit my .xlsx file with openpyxl?

Faild to save a xlxs file twice with comments using openpyxl

Categories

Resources