I'm trying to write a parser in python at the moment, that reads nessus reports and generates xlsx files.
Is there a detailed description of the inner workings of xlsx? I have a hard time trying to find out just by looking at the xml files, where I specify which style is applied to which cell on which sheet.
You can find full details of the OfficeOpenXML standard on the ECMA site but why not use one of the existing Python libraries (such as Eric Gazoni's openpyxl) to actually generate the xlsx file rather than building your own?
Related
I have an interest in utilizing the stock quote feature of excel while also creating the file using the xlsxwriter libray in Python. I am familiar with how to write and format text using xlsxwriter but I do not see any option to create a file with certain cells already set to have a stock data type. To be clear, the link from microsoft below basically summarizes the manual process I'm looking to have taken care of in the excel sheet before an actual user ever opens up the file.
https://support.microsoft.com/en-us/office/get-a-stock-quote-e5af3212-e024-4d4c-bea0-623cf07fbc54
I am open to other python based solutions to this issue if the general consensus is that xlsxwriter doesn't support this feature. I really appreciate any advice here.
I am the author of XlsxWriter. I just looked into this and these aren't regular Excel formulas. They have a lot of of additional metadata and richdata helper files associated with them and even the company names aren't standard string types. So unfortunately these aren't, and probably won't be, supported.
They are all open source Python packages to control Excel (see python-excel). I am still trying to understand their code. If anyone could give a hint, do how they connect in a low lever to Excel? Via xml, Excel API, or some other basic Python packages?
If we are talking about reading and writing XLS files, basically xlrd and xlwt follow the OpenOffice.org document/specification describing Excel's format and BIFF (Binary Interchange File Format) records to read and write XLS files. If you would inspect the xlwt source code, you would find it manipulates the BIFF records for everything needs to be written: creating workbook, worksheets, writing data, formatting, alignment etc.
With XLSX the story is a bit different. To read XLSX xlrd relies on the openxmlformats XML schemas and use built into Python ElementTree XML parsers (cElementTree if available, otherwise ElementTree) to parse the XLSX file which is, to simplify, a zip archive containing XML files inside. Here is a good overview of what is inside the archive:
Anatomy of OOXML - xlsx
I would also recommend studying the xlsxwriter module - from my point of view, the package is much better documented and the code is much more cleaner and readable than xlwt or xlrd.
I have built a couple basic workflows using XML tools on top of XLSX workbooks that are mapped to an XML schema. You would enter data into the spreadsheet, export the XML and I had some scripts that would then work with the data.
Now I'm trying to eliminate that step and build a more integrated and portable tool that others could use easily by moving from XSLT/XQuery to Python. I would still like to use Excel for the data entry, but have the Python script read the XLSX file directly.
I found a bunch of easy to use libraries to read from Excel but they need to explicitly state what cells the data is in, like range('A1:C2') etc. The useful thing about using the XML maps was that users could resize or even move tables to fit different rows and rename sheets. Is their a library that would let me select tables as units?
Another approach I tried was to just uncompress the XLSX and just parse the XML directly. The problem with that is that our data is quite complex (taking up to 30-50 sheets) and parsing that in the uncompressed XLSX structure is really daunting. I did find my XML schema within the uncompressed XLSX, so is there any way to reformat the data into this schema outside of Excel? (basically what Excel does when I save a workbook as an .xml file)
The Excel format is pretty complicated with dependencies between components – you can't for example be sure of that the order of the worksheets in the folder worksheets has any bearing to what the file looks like in Excel.
I don't really understand exactly what you're trying to do but the existing libraries present an interface for client code that hides the XML layer. If you don't want that you'll have to root around for the parts that you find useful. In openpyxl you want to look at the stuff in openpyxl/reader specifically worksheet.py.
However, you might have better luck using lxml as this (using libxml2 in the background) will allow you load a single XML into Python and manipulate it directly using the .objectify() method. We don't do this in openpyxl because XML trees consume a lot of memory (and many people have very large worksheets) but the library for working with Powerpoint shows just how easy this can be.
We have a project in python with django.
We need to generate complex word, excel and pdf files.
For the rest of our projects which were done in PHP we used PHPexcel ,
PHPWord and tcpdf for PDF.
What libraries for python would you recommend for creating this kind of files ? (for excel and word its imortant to use the open xml file format xlsx , docx)
Python-docx may help ( https://github.com/mikemaccana/python-docx ).
Python doesn't have highly-developed tools to manipulate word documents. I've found the java library xdocreport ( https://code.google.com/p/xdocreport/ ) to be the best by far for Word reporting. Because I need to generate PCL, which is efficiently done via FOP I also use docx4j.
To integrate this with my python, I use the spark framework to wrap it up with a simple web service, and use requests on the python side to talk to the service.
For excel, there's openpyxl, which actually is a python port of PHPexcel, afaik. I haven't used it yet, but it sounds ok to me.
I would recommend using Docutils. It takes reStructuredText files and converts them to a range of output files. Included in the package are HTML, LaTeX and .odf file writers but in the sandbox there are a whole load of other writers for writing to other formats, see for example, the WordML writer (disclaimer: I haven't used it).
The advantage of this solution is that you can write plain text (reStructuredText) master files, which are human readable as is, and then convert to a range of other file formats as required.
Whilst not a Python solution, you should also look at Pandoc a Haskell library which supports a much wider range of output and input formats than docutils. One major advantage of Pandoc over Docutils is that you can do the reverse translation, i.e. WordML to reStructuredText. You can try Pandoc here.
I have never used any libraries for this, but you can change the extension of any docx, xlsx file to zip, and see the magic!
Generating openxml files is as simple as generating couple of XML files (you can use templates) and zipping it.
Simplest way to generate PDF is to generate HTML (with CSS+images) and convert it using wkhtmltopdf tool.
What I need to know is, can I get Python to read a spreadsheet (preferably Microsoft Excel), then parse the information and input it into an equation?
It's for a horse-racing program, where the information for several horses will be in one excel spreadsheet, in different rows or columns. I need to know if I can run a calculation for each of those horses separately and then calculate a score for the given horse.
My suggestion is:
Save the Excel file as a csv comma separated value file, which is a plain text format and much easier to work with.
Use Python's built-in csv module to work with the data in csv format.
You can work with Excel files directly in Python (Excel 2003 format supported via the third party modules xlwt, xlrd) but this is much harder than working with CSV.
OpenPyXL ("A Python library to read/write Excel 2007 xlsx/xlsm files") has a very nice and Pythonic API.
Use xlrd package. It's on PyPI, so you can just easy_install xlrd
You can export the spreadsheet as a .csv and read it in as a text file, then process it. I have a niggling feeling there might even a CSV parsing python library.
AFAIK there isn't a .xls parser, although I might be wrong.
EDIT: I was wrong: http://www.python-excel.org/