"Bad magic number for file header" error while using openpyxl - python

I have made a python script which is meant to read an excel spreadsheet and return the value of cell A39. I'm using the openpyxl library.
Here is the part of the code that is giving an error:
cFile = openpyxl.load_workbook('contacts.xlsx', read_only= True)
sheet = cFile.get_sheet_by_name('cSheet')
print sheet['A39'].value
Instead of printing the value of cell A39, which in the spreadsheet is "38", I get the following error:
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/zipfile.py", line 971, in open
raise BadZipfile("Bad magic number for file header")
zipfile.BadZipfile: Bad magic number for file header
The spreadsheet 'contacts.xlsx' is not a zipped file. It is in the same folder as the python script. I made it with Excel 2011. Does anyone know why I'm getting this error or how I can fix it?
Thanks!

When I had the same similar error, I also couldn't understand the issue.
Also working with openpyxl library.
My root problem was that some file was opened and save at the same time while using some kind of Multithreading. For a solution, I changed the name of the file according to every thread. you could also use some locks.

Related

Question about error when saving xlsm file using xlwings

I want to open an xlsm file via xlwings and then edit it and save it. However, some problems arose.
If I run the code with no excel file working, or just open another excel file and do not edit the excel file, it works fine. However, if I open an Excel file and do some work, for example open a blank Excel file and enter 'test' in cell A1, and run the code, sometimes it works, but sometimes it becomes unresponsive in the third line.(wb_xl = xw.Book(copy)) In this case, the code does not jump from the third line in an unresponsive state. What makes more sense is that the code works fine in some cases.
I want to know when the code works fine in all cases.
And there is one more problem.
If this code is executed while working with another Excel, only wb_xl should be terminated. I don't want another Excel to be closed. I want to exit only wb_xl. However, when the app.quit() code is executed, all open Excels are closed. In this case, how can I close only the Excel(wb_xl) opened through the code without closing the working Excel?
import xlwings as xw
copy = 'C:/Users/ijung/Desktop/210919_Mk_Lot_test/210922_101test.xlsm'
wb_xl = xw.Book(copy) #sometimes no response in this line
ws_xl = wb_xl.sheets['Main']
app = xw.apps.active
ws_xl.range('A1').value = 'test'
wb_xl.save()
app.quit()
#wb_xl.app.kill()
#wb_xl.close()
I also used openpyxl. However, in this part of wb_open.save(copy), an error such as xml.etree.ElementTree.ParseError: mismatched tag: line 20, column 8 occurred. When I use xlsx, the save works fine, but when I use xlsm, an error occurs.
import openpyxl
wb_open = openpyxl.load_workbook(copy, read_only = False, keep_vba = True)
ws_open = wb_open.active
ws_open.cell(1,1).value = 'test'
wb_open.save(copy) #error
wb_open.close()
As a result, the purpose of this code is to open the xlsm file by executing this code even when working with another Excel, edit and save, and close only this xlsm file.However, using multiple packages and searching multiple sites could not solve the problem.I'm under a lot of stress with this issue. Any help would be greatly appreciated. Please help me.
Thanks in advance.
openpyxl does not works with xlsm files that contains form objects
I think the problem is in app.quit() you are closing the excel instance, just use wb_xl.close()
import xlwings as xw
copy = 'C:/Users/ijung/Desktop/210919_Mk_Lot_test/210922_101test.xlsm'
wb_xl = xw.Book(copy) #sometimes no response in this line
ws_xl = wb_xl.sheets['Main']
#app = xw.apps.active # don't needed
ws_xl.range('A1').value = 'test'
wb_xl.save()
wb_xl.close()
This should only close the book, take a look this post has insteresting answers

openpyxl error raise ValueError('Min value is {0}'.format(self.min)) in opening heavy file with formatting

I'm trying to use openpyxl for the first time on a very heavy file, that happens to be over 20 500 Ko, has a lot of formatting and a VBA macro.
My code keeps returning the following error:
File " \Anaconda3\lib\site-packages\openpyxl\styles\alignment.py", line 52, in __init__
self.relativeIndent = relativeIndent
File " \Anaconda3\lib\site-packages\openpyxl\descriptors\base.py", line 107, in __set__
raise ValueError('Min value is {0}'.format(self.min))
ValueError: Min value is 0
Would anyone know what the problem is / how to access the file despite it? I'm trying to post data into an existent Excel file to simplify processes and replace a heavy VBA code. So I can't just post it into a different xlsx file and call it using a VBA code (that would defeat the purpose).
Thanks a lot!
Here is my code :
wb = load_workbook(filename='C:/dev/CodeRep/ProjectName/MainFile 2021_01.xlsm', read_only = False, keep_vba = True)
The traceback says that there is a problem with the Alignment definition in the workbook's stylesheet. openpyxl follows the OOXML specification very closely to minimise unpleasant surprises later, this is why it tends to raise exceptions or give warnings rather than let things pass.
For more details we'll need to see the XML source for the stylesheet, or the Alignments part at least. You can find this by unzipping the XLSM file and looking for the styles.xml file. That will give you more information and also allow you to submit a bug report to openpyxl.
Preprocess the file
I solved this issue by preprocessing the excel file.
Found that mi problem was at "*/myfile.xlsx/xl/styles.xml" where several xf tags had an attribute indent="-1", and openpyxl only supports non-negative values, raising that exception when a negative value is found.
After some time spent trying to override entire openpyxl hierarchy in order to catch the exception, I decided to process the XLSX.
Here is my code:
def fix_xlsx(file_name):
with zipfile.ZipFile(file_name) as input_file, zipfile.ZipFile(file_name + ".out", "w") as output_file:
# Iterate over files
for inzipinfo in input_file.infolist():
with input_file.open(inzipinfo) as infile:
if "xl/styles.xml" in inzipinfo.filename:
# Read, Process & Write
lines = infile.readlines()
new_lines = b"\n".join([line.replace(b'indent="-1"', b'indent="0"') for line in lines])
output_file.writestr(inzipinfo.filename, new_lines)
else:
# Read & Write
output_file.writestr(inzipinfo.filename, b"\n".join([line for line in infile.readlines()]))
# Replace file
os.replace(file_name + ".out", file_name)
Disclaimer:
I must say this is not a very elegant solution as the entire file is processed, and an auxiliary file is used.
Also I am not so expert at excel to tell wheter changing that indent="-1" to indent="0" for those tags might cause format problems in the file. This is my working solution and can't really tell the effect of those tags.
I had the same issue — the file wasn't accepted by Openpyxl.
I just opened the file in MS Excel and saved it to a new file. And it worked after that.
I got the same error and wasn't able to figure out the exact cause, but noticed when I ran my python script in a different environment it worked without issue.
I realized it may have had something to do with the versions of the openpyxl and xlrd packages I was using so I downgraded them to openpyxl==3.0.4 and xlrd==1.2.0 (previously using openpyxl==3.0.7 and xlrd==2.0.1) and that solved my issue.
I ran into this issue, my solution was to pinpoint what was causing the error in the spreadsheet (had something to do with a table that was recently modified) and reconstruct that table in the worksheet. much easier for me than debugging openpyxl or xml.

Pandas.read_excel: Unsupported format, or corrupt file: Expected BOF record

I'm trying to use pandas.read_excel to read in .xls files. It succeeds on most of my .xls files, but then for some it errors out with the following error message:
Unsupported format, or corrupt file: Expected BOF record; found '\x00\x05\x16\x07\x00\x02\x00\x00'
I've been trying to research why this is happening to some, but not all files. The xlrd version is 1.0.0. I tried to manually read in with xlrd.open_workbook and I get the same result.
Does anyone know what file type, this BOF record is referring to?
There are various reasons to why that error message appeared. However, the main reason could be due to the Excel file itself. Sometimes, especially if you're pulling an Excel file from some Reporting Portal, the Excel file could be corrupt so the best thing would be to open the Excel file and save it as a new .xls file then retry running pandas.read_excel.
Lemme know if it works.
I solved this problem loading it with pd.read_table (it loads everything into one column)
df = pd.read_table('path/to/xls_file/' + 'my_file.xls')
then I split this column with
df = df['column_name'].str.split("your_separator", expand=True)
Please check if you have given the right extension of the file either xlsx or csv. a wrong extension specified of the file may cause this issue.

Openpyxl NotImplementedError Only When Loading Workbook

I have been working on a program to input some data into an excel file using Openpyxl with options of either loading an existing file or creating a new file. While creating a new file allows me to write the data to the excel file without any problems but loading from an existing file and trying to write new data to new rows raises a NotImplementedError with the line:
ws['A' + str(row)] = gene]
even though it was the same for writing to a new file.
Any help would be greatly appreciated!
Update: Thanks Charlie, after removing use_iterators from:
wb = load_workbook(filename=file_name+'.xlsx', use_iterators=True), the code let me write to the file.
If you open a file in read-only mode, why do you expect to be able to edit it? The exception is raised for exactly this reason.
Remove use_iterators when opening the file to avoid this.

python xlrd errors related to file extension changes

I am trying to organize a very large number of .DTA files using the xlrd library.
The first thing I found out was that .DTA files could be exported to excel files just by changing the extension .xls and opening them in excel. It gives a warning when you open it gives an error about a possibly corrupted file, but opens normally otherwise.
the file you are trying to open is in a different format than specified by the file extension. Verify that the file is not corrupted and is from a trusted source before opening the file. Do you want to open the file now?
When in python however, when I try to open the file all I get is an error with no helpful information, which I'm pretty sure is caused by the file extension issue.
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 1323, in getbof
raise XLRDError('Expected BOF record; found 0x%04x' % opcode)
XLRDError: Expected BOF record; found 0x5845
I tried my code by cutting and pasting the data into a new excel file and naming it the same thing and it worked, so I'm pretty sure this is the issue, but I have too many files to be able to do this for each one individually.
Is there a better way to solve this? Supressing the error or actually changing the file type and not just its extension somehow?
I think there is a Byte Order Mark at the beginning of the file that is not observable but exists. This answer describes how to remove it < converting utf-16 -> utf-8 AND remove BOM>.

Categories