I am in an effort to parse metadata out of Excel files. I have tried openpyxl, but that does not give me the desired results.
I would like to read the Company Name and Author Name from properties.
Thank You
This is the information I am trying to parse
Related
I have a question for you, I'm working on a new jenkins instance and as a result of the job I get a csv file with errors if there were any during the test. I would like to generate an HTML report based on this csv file, which would be more convenient to use than opening excel and loading the csv file to see the errors. I came across a plugin like HTML Publisher, unfortunately I don't know if it supports generating HTML reports based on csv files. Alternatively, you could do something like this with a python script and show the resulting html file in artifats. Do you have any ideas ??
I want to automate some report creation. Some elements that I need in the report are saved as rich text, so an HTML file. There are a couple of libraries to do this, such as html2pdf or pdfforge. However, I would also like to add extra information to the report that is not located in this HTML file, like for example a title or some information queried from the DB that is not necessarily in the HTML file.
Does anyone have a suggestion to do this?
Thanks in advance.
As the title suggests, how can I import a XML file into Spreadsheet with Python. I need for documenting High Level Router Configurations in a Spreadsheet format.
I have been using xlsxwriter to create spreadsheet etc, but have no idea how to put xml files into spreadsheet
This is not a duplicate although the issue has been raised in this forum in 2011Getting a hyperlink URL from an Excel document, 2013 Extracting Hyperlinks From Excel (.xlsx) with Python and 2014 Getting the URL from Excel Sheet Hyper links in Python with xlrd; there is still no answer.
After some deep dive into the xlrd module, it seems the Data_sheet.hyperlink_map.get((row, col)) item trips because "xlrd cannot read the hyperlink without formatting_info, which is currently not supported for xlsx" per #alecxe at Extracting Hyperlinks From Excel (.xlsx) with Python.
Question: has anyone has made progress with extracting URLs from hyperlinks stored in an excel file. Say, of all the customer data, there is a column of hyperlinks. I was toying with the idea of dumping the excel sheet as an html page and proceed per usual scraping (file on local drive). But that's not a production solution. Supplementary: is there any other module that can extract the url from a .cell(row,col).value() call on the hyperlink-cell. Is there a solution in mechanize? Many thanks.
I had the same problem trying to get the hyperlinks from the cells of a xlsx file. The work around I came up with is simply converting the Excel sheet to xls format, from which I could manage to get the hyperlinks withount any trouble, and once finished the editing, I formatted it back to the original xlsx file.
I don't know if this should work for your specific needs, or if the change of format implies some consecuences I am not aware of, but I think it's worth a try.
I was able to read and use hyperlinks to copy files with openpyxl. It has a cell_obj.hyperlink and cell_obj.hyperlink.target which will grab the link value. I made a list of the cell row col values which had hyperlinks, then appended them to a list and then looped through the list to move the linked files.
I have a large volume of XLS files. Within a sheet in the files, it contains a column header of "name" and "number". Unfortunately the format of each XLS varies and the name of the sheet that the data is in varies from one file to another.
I am able to parse through a sheet using Python 2.7x to extract the data from specific columns what I'm now looking to do is open each XLS file and work out which sheet contains the headers "name" and "number" before then extracting the data within these columns and importing to a MySql.
Any suggestions of how to do this or libraries to use?
XYPath might be worth a look - it lets you query XLS files for the contents of tables, including by the names and contents of columns.