Adding XML Source to xlsx file in python - python

I am trying to create a xlsx from a template exported from Microsoft dynamics NAV, so I can upload my file to the system.
I am able to recreate and fill the template using the library xlsxwriter, but unfortunately I have figured out that the template file also have an attached XML source code file(visible in the developer tab in Excel).
I can easily modify the XML file to match what I want, but I can't seem to find a way to add the XML source code to the xlsx file.
I have searched for "python adding xlsx xml source" but it doesn't seem to give me anything I can use.
Any help would be greatly appreciated.
Best regards
Martin

Xlsx file is basically a zip archive. Open it as archive and you'll probably be able to find the XML file and modify it. –
Mak Sim
yesterday

Related

XML and Excel Structures, debugging and etc

I'm currently working on this project: https://github.com/lucasmolinari/unlocker-EX.
It's a excel unlocker, it works by editing the XML files inside the workbooks. (more information on the github page).
The script works fine in workbooks with almost no content inside, but recently I'm testing some bigger workbooks, and when I open the unlocked file, excel says it's corrupted and I can't find any difference between the original and the unlocked workbook, I'm 100% sure the problem is when the script change the content in the file, I watched every step of the script and it just stops working when the files are edited.
Does someone have more knowlege on how XML files work or in the structure of excel workbooks? Or like, some way to verify the differences between the original file and the edited to see if is some formatting problem..? I'm really sorry about this question, but I have no idea from where to start now, I tried everything I can.
Changed to open files in UTF-8 format and tried to find any corrupted character in the edited file,but manually is too hard to find any.
Using ElementTree library solves the problem

Avoid pop-ups in Excel while running code in Python

I want to be able to open an Excel document and start manipulating the data without seeing any pop-ups.
I think the pop-ups are the ones stopping my Excel file from opening successfully. Here are the pop-ups I am seeing at excel and I would like to automatically answer them instead of doing it manually. I found some answers online but not for my case.
The file format and extension of "xxx" don't match. The file could be
corrupted or unsafe. Unless you trust its source, don't open it. Do
you want to open it anyway?
option1: Yes , option2: No , option3: Help
or
Open XML Please select how you would like to open this file:
As an XML table
As a read-only workbook
use the XML Source task pane
or
XML Import Error
ok
help
After I select: Yes & As an XML table & ok, everything works perfectly. If anyone could help me out I would much appreciate it.

How to load in Python an xlsx that originally had .xls file extension?

I'm using xlrd to process .xls files, and openpyxl to process .xlsx files, and this is working well.
Then I'm handed what is ostensibly a .xls file, so I try to xlrd.open_workbook(), and get:
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '<?xml ve'
I take a look at this question, and I surmise that my file, although ending with extension .xls, must actually be a .xlsx. And indeed, I can view it in a text editor:
<?xml version="1.0" encoding="UTF-8"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
:
:
:
(for privacy reasons, I can't post the whole file, but it's probably not required for our analysis).
So I surmise that if I just copy (cp) it to a .xlsx, I should be able to open it with openpyxl.load_workbook(), but I get:
BadZipfile: File is not a zip file
If it's actually an xls (unlikely) but can't be opened with xlrd, and if it is atcually an xlsx but can't be opened with openpyxl, even after I cp it to a .xlsx, what to do?
Note: If I open up the .xls in Excel, save it as a .xlsx, and retry with openpyxl, it does load fine, but this manual step is not a luxury I will have in the executing of my program.
One thing is clear: The file you're trying to open has a different format than its extension suggests.
As you already know, Excel file formats include (but are not limited to) xls and xlsx.
The Excel 2003 format (xls) is a binary format. This means that if you open a xls file with a text editor, you'll just see gibberish.
The Excel 2007 format (xlsx) is quite different. A xlsx file is a zip file with a bunch of XML files inside. You can use a zip archiver to extract the contents of the xlsx file. Then, you can edit the XML files using any text editor. However, opening a xlsx file directly with a text editor is like opening a zip file with a text editor: You'll just see gibberish.
The fact that you can open your file with a text editor (and read its contents) shows that it's neither a xls file nor a xlsx file. Your file is neither a binary file nor a zip file, it's a plain XML file.
Moreover, this error message says a lot.
BadZipfile: File is not a zip file
It means that openpyxl is trying to open your file as a xlsx file and therefore a zip file. But when it tries to extract its contents, it fails, because your file isn't even a zip file.
But if the file is neither a xlsx file nor a xls file, how can Microsoft Excel read it? I wondered that too. After some research, I believe your file has the XML Spreadsheet 2003 file format. This example looks very similar to the file content you posted. Since Microsoft Excel supports this format, it's no wonder that it can read your file.
Unfortunately, Python libraries such as xlrd and openpyxl only support xls and xlsx file formats, so they won't be able to read your file. I think you'll just have to manually convert it to a supported format.
I am not on OSX, so this is not tested. You may be able to use the appscript package, despite it's lack of support, to open the offending file and the resave it.
from appscript import *
excel = app('Microsoft Excel')
wb = excel.open('/path/to/file.xls')
wb.save_as('/path/to/fileout.xlsx', file_format=k.XLSX_file_format)
#not sure the exact name of k.excel_file
I had a similar problem. It turned out that it needed the absolute file path. E.g., "c:/dir/filename.xlsx" instead of "filename.xlsx". Relative paths worked on osx, but not on Windows.

Why does zipfile.is_zipfile returns True on xlsx file?

I am using is_zipfile to check if it is a zipfile before extracting it. But the method returns True on excel file from a StringIO object. I am using Python 2.7. Does anyone know how to fix this? Is it reliable to use is_zipfiile? Thanks.
Quoting from the Microsoft's XLSX Structure overview doc,
Workbook data is contained in a ZIP package conforming to the Open
Packaging Conventions
So, .xlsx files are actually zip files only. If you want not to consider them as a zip file, you may have to exclude with an if condition like this
if os.path.splitext(filename)[1] != ".xlsx" and zipfile.is_file(filename):
This is because xlsx is actually a valid zip file.
See also:
Office Open XML
The Microsoft Office XML formats

xlsx writing - where specified?

I'm trying to write a parser in python at the moment, that reads nessus reports and generates xlsx files.
Is there a detailed description of the inner workings of xlsx? I have a hard time trying to find out just by looking at the xml files, where I specify which style is applied to which cell on which sheet.
You can find full details of the OfficeOpenXML standard on the ECMA site but why not use one of the existing Python libraries (such as Eric Gazoni's openpyxl) to actually generate the xlsx file rather than building your own?

Categories